Lately I’ve been exploring ways to improve the performance of some areas of code where embedded python calls are made from C++. A common and obvious (and often effective) approach for increasing performance in these areas is to forgo python and rewrite performance-critical areas in C/C++. However, I got curious about other, less invasive ideas, and that led me to thinking about how python methods can be accessed from C++.
A reasonable way to call a python function from C++ might look like this (using boost.python):
// import the module object obj = import("my_module").attr("MyClass")(); // call the function obj.attr("some_method")();
This works just fine. But what if you need to call
some_method repeatedly, say, 100000 times? In many embedded python scenarios, maybe most, you can identify some collection of python methods that are called many times from C++. This raises the question of whether you can lower the cost of these calls.
One approach to lowering the cost-per-call that I experimented with provided some useful results. In the code above, you’ll notice that each call to
some_method requires a call to
obj.attr() just to access the method object. What if we “cached” that method object and called it directly rather that refetching it? For example:
// import the module object obj = import("my_module").attr("MyClass")(); // cache the method object object method = obj.attr("some_method"); // call the method object a bunch of times for (int i = 0; i < 1000000; ++i) method();
This not very invasive or onerous, requiring no change to the python code and very little change to the C++ code. The question is, what does it buy you in terms of performance?
To try and measure this, I wrote a C++ program that makes repeated calls to a python method using first the “standard” access method (going to through attribute lookup for each call) and then the “interned” access method (caching the method object). In the basic experiment, the python method simple called
pass and did no real work, so the measured time for the calls is purely the call overhead. This graph shows the results:
As you can see, it takes about twice as long to call a function using attribute access as it does to make the interned call. Of course, the “slow” method access takes something like 0.001 milliseconds per call, but nevertheless, double the performance is nothing to sneeze at.
The next obvious question is when does this kind of method caching pay off? In other words, how much work does a function have to do before the cost of method access becomes irrelevant? Clearly, if a function takes 10 minutes to run, then saving a fraction of a millisecond on call overhead is not likely to be something you want to optimize.
To get a handle on this, I modified the called python method to perform some amount of “real” work. In the case of the following graph, each call to the python method included 100 multiplications:
I decreased the number of sample points because, of course, this experiment took much longer per sample than the previous one. In this case, the overhead per call, while still apparent, is dwarfed by the work of the function.
So the question of whether this kind of optimization is fruitful would seem to depend largely on two factors:
- How often the method is called
- How much work the method itself does
If the method itself does very little and it’s called many times, then caching the method object might be a great idea. If it does a lot of work and is only called a few time, you will still see a minor performance improvement, but it’s hard to argue that the extra complexity and brittleness is worth it.
Speaking of brittleness, it’s worth noting that caching a method like this does introduce a new kind of dependency between the C++ and python code. In general, a call to
attr() to get a method object does not need to return the same or even roughly equivalent object each time. The return value could be any callable and could result from any arbitrary calculation that the python code wants to make. In other words, to use this caching optimization you need to be certain that a cached method reference is semantically equivalent to repeated calls to
attr(). Very often, it’s safe to make this assumption, but of course it’s also possible for the semantics of a particular attribute access to change silently when the python code changes, so you need to be aware and vigilant if you use this approach.
Finally, you’ll notice that you can equally easily use this approach in a purely python program. Nothing prevents you from caching a method object in python to try to get the same performance benefits as described here. I haven’t done any tests to see how this might pan out, but it’s probably worth experimenting with.