Wednesday, April 22, 2015

Python functions - partial vs inline

When doing async programming it very natural to write a code like this:
def _connect_one(self, conn):
    future = Future()
    def on_connect(fut):
        # Do something useful here

    self.ioloop.add_future(conn.connect(), on_connect)
    return future
I define a callback function, inline - and it allows me easy access to local closure variables. Another alternative is to factor out the callback to a separate method and to use partial to bind input arguments:
from functools import partial

def _on_connect(self, orig_future, conn, future):
    # Do something useful here

def _connect_one(self, conn):
    future = Future()
    self.ioloop.add_future(conn.connect(), partial(self._on_connect, future, conn))
    return future
To some this may look more elegant, especially if in real life your callbacks become large. However I was only interested to find out which way is faster. To test this, I've created this simple piece of code:
from __future__ import print_function
from functools import partial
import time

class Runner(object):
    def inline(self):
        a = "foo"
        b = "bar"
        def go():
            c = 2
            d = 3
            if c < d:
                e = a + b
    def go(self, a, b):
        c = 2
        d = 3
        if c < d:
            e = a + b

    def explicit(self):
        self.go("foo", "bar")

    def partial(self):
        partial(self.go, "foo", "bar")()

def run(what, count):
    start = time.time()
    c = count
    while c:
        c -= 1
    print("%s took %.2f seconds for %s iterations" % (what.__name__, time.time() - start, count))

runner = Runner()
run(runner.explicit, 10000000)
run(runner.inline, 10000000)
run(runner.partial, 10000000)
I've run it in Python 2, Python 3 and PyPy and here are the results:

Python 2.7.8
explicit took 3.02 seconds for 10000000 iterations
inline took 4.74 seconds for 10000000 iterations
partial took 4.90 seconds for 10000000 iterations
Python 3.4.2
explicit took 2.73 seconds for 10000000 iterations
inline took 3.84 seconds for 10000000 iterations
partial took 4.13 seconds for 10000000 iterations
PyPy 2.5.1
explicit took 0.03 seconds for 10000000 iterations
inline took 0.16 seconds for 10000000 iterations
partial took 0.15 seconds for 10000000 iterations
I was surprised that inline was actually faster then partial, considering that partial is implemented in C in CPython. It seems that it takes less time for Python to skim through existing function definition than to create a function wrapper. Explicit version is the fastest one obviously. But what matters most, is that all three versions are fast enough for real-life applications - any of them can do more than 2 million calls per second on my dusty laptop.

P.S. As usual, PyPy renders lots of micro-measurments like this to be not relevant :)