I recently needed to do some profiling of a largish C++ program, something I hadn’t done in a while, so I did a little research into the available free options. My first stop, of course, was the old standby gprof which I quickly (though not after recompiling everything with -pg) abandoned because it still, after all of these years, doesn’t seem to handle shared libraries properly. Some people on the net claim that it does, others think it doesn’t, but whatever the truth may be, I couldn’t get it working, and I wasn’t about to try statically linking the program. Moving on…
Other options I looked at included sprof (didn’t work with my nvidia driver for some reason), callgrind (crashed), and, finally, oprofile. I was initially skeptical of oprofile because a) it was not initially clear how to do a basic profiling run of a program and b) it wasn’t clear to me how to interpret the output I was getting. After the other tools failed, though, I gave oprofile a closer look and am now getting what I want from it.
Since I had some issues getting started with oprofile, I thought I’d describe how I’m using oprofile. Hopefully this will make it easier for others to come up to speed with what is, in fact, a really cool tool.
Let me be clear that this is a very limited introduction to oprofile. All I wanted was to be able to was run a program and get profiling information for it and a handful of associated shared libraries. oprofile’s complexity and feature set were part of what made it hard to get started with, so to fully appreciate it you’ll need to read more.
Once you’ve installed oprofile, it’s fairly painless to start getting profiling information. The basic flow is:
- start the oprofile daemon
- modify the daemon settings
- run your program
- flush the daemon’s output
- shutdown the daemon (optional)
- generate reports on your program
The oprofile daemon is the real magic in the system. It monitors various counters in your system to generate logs that are later correlated to your binaries with the reporting tool, opreport. Because it works at a low level, there is no need to compile your programs in a special way to get profiling information (notably unlike gprof.)
More concretely, a typical run of oprofile for me looks like this:
# Only really need to do this once...the setting is remembered opcontrol --no-vmlinux # start the daemon opcontrol --start # run your program my_program # stop the daemon opcontrol --stop # flush any unflushed data opcontrol --dump # generate a report opreport -l my_program mylib1 ../lib/lib*so
Note that you generally need root privileges to run opcontrol, so you might need to use sudo on the opcontrol commands.
This will produce a report of profiling information for symbols in my_program, mylib1, and every library in ../lib. You can vary the arguments to opreport if you want to change which symbols you see.
It’s often critical to get callgraphs from a profiler. With oprofile, the first step for getting callgraphs is to configure the daemon to remember them by using the -c argument with your required call stack depth:
% opcontrol -c 10
Once you’ve done this, you do your profiling run as before, passing another -c to opreport (without specifying call stack depth):
% opreport -c -l my_program . . .
When you put it all together, you get:
% opcontrol -c 10 % opcontrol --start % my_program % opcontrol --stop % opreport -c -l my_program
Note that oprofile currently relies on frame pointers to do callgraphs, so you may not be able to use it on all architectures (e.g. apparently x86_64). This can sometimes be addressed with a compiler flags such as
And so much more…
Like I said, this just scratches the surface of what oprofile can do, but you can get quite far with it. You can get all of the information you might want about oprofile, including a manual and examples, from the oprofile website. I’ve just begun to use oprofile, so maybe I’ll post more about it as I learn more. In particular, I need to learn more about interpreting the reports, so perhaps that will show up here soon.