Thursday, June 4, 2015

Do we really need to keep writing Python async adapters for every database?

I've entered the realm of async programming in Python about two years ago. I've started with Tornado framework and my first task was to design REST API against backend powered by ElasticSearch. Luckily, ElasticSearch speaks HTTP and Tornado comes with async http client - no worries. Later on, I needed to speak to Postgres and found Momoko project that wraps async functionality of psycopg2 for Tornado. It was a bit rusty and dusty, but with some patching worked well.

Next came MySQL... well, this is when I started to ask myself questions. The thing is that I did not find anything stable for MySQL. Tornado wiki lists couple of projects - first has unclear license and too many bold warnings, the other one did not had any releases since 2009. At the end I've ended up using ad-hock pooling software based on good old MySQLdb, plus ThreadExecutor from concurrent.futures package. Fortunately my project did go further then prototype.

Time went on as I decided to rewrite Momoko to use Tornado's new Future interface. After rewrite I run some benchmarks and the results were quite surprising. In fact async code introduces quite a bit of overhead. So if your database is very fast and you don't open huge amount of connections (less then 100), async code has actually more overhead then threaded code. And obviously you are not going to open hundreds of database connections from a single Tornado process. I.e. in real life, connection handling overhead will be negligible compared to database response.

So do we really need native async adapters for every SQL database? Looks like we don't. And that's probably why there are no any Tornado adapters for neither MSSQL nor Oracle.

What I really liked about Momoko is its connection pooling. It behaves like... well, like a good connection pooling that you get from SQLAchemy for example. So I think the ultimate solution for async connectivity to SQL databases would be async connection pool project that would expose native Future-based interface and use ThreadExecutor behind the scenes to utilize any Python DB2ABI driver. This will unleash async power to any of the existing DB2API modules.

Sounds like an idea for the next open source project. Unless there already exist one. Did I miss it?