Sunday, May 12, 2019

Testing Lua "classes" speed

"Testing" is a bit too strong word for what I've done here, but numbers are still interesting.

I developed a "smart" reverse proxy recently where I decided to use OpenResty platform - it's basically Nginx + Lua + goodies. Lua is the first class language so theoretically you can implement anything with it.

After the couple of weeks I spent with Lua it strongly reminds me JavaScript 5 - while it's a complete language, it's very "raw" in a sense that while it has constructs to do anything there is no standard (as in "industry-standard") way to do many things, classes being one of them. Having a strong Python background I'm used to spend my time mostly on business logic and not googling around to find best 3rd-party set/dict/etc. implementation. Many praise Lua's standard library asceticism (which reminds me similar sentiments in JS 5 days), but most of the time I get paid to create products, not tools. Also, lack of uniform way to do common tasks results in quite non-uniform code-base.

Having said the above, I chose OpenResty. I already had Nginx deployed, so switching to OpenResty was a natural extension. It was exactly what I was looking for - a scriptable proxy - which is OpenResty's primary goal as a project. I didn't want to take a generic web-server and write middleware/plugin for it - it sounded a bit too adventurous and risky from security perspective. So getting back to JS 5 days using niche language like Lua was a good trade-off.

Eventually I liked Lua. There is a special cuteness to it - I often find myself smiling while reading Lua code. Particularly it provided a great relief from Nginx IF evilness I used before.

Let's get to the point of this post, should we? While imbuing my proxy with some logic I decided to check which class-like approaches in Lua is the fastest. I ended up with 3 contenders:

  • Metatables
  • Closures
  • pl.class - part of the excellent PenLight Lua library that aims to complement Lua with Python-inspired data types and utilities. This class implementation is also metatable-based but involves more internal boilerplate to support, e.g. inheritance.

I implemented class to test object member access, method invocation, and method chaining. The code is in the gist.

Let's run it

I used LuaJIT 2.1.0-beta3 that is supplied with the latest OpenResty docker image. pl.class documents two ways to define a class, hence I had two versions to see if there is any difference.

Initialization speed


Func:       815,112,512 calls/sec
Metatable:  815,737,335 calls/sec
Closure:      2,459,325 calls/sec
PLClass1:     1,536,435 calls/sec
PLClass2:     1,545,817 calls/sec

Initialization + call speed


Metatable:  816,309,204 calls/sec
Closure:      2,104,911 calls/sec
PLClass1:     1,390,997 calls/sec
PLClass2:     1,453,514 calls/sec

We can see that Metatable is as fast as our baseline plain func. Also with metatable, invoke does not affect speed - probably JIT is doing amazing job here (considering the code is trivial and predictable).

Closures are much slower and invocation has cost. penlight.Class, while most syntactically rich, is the slowest one and also takes hit from invocation.

Conclusions

Being myself a casual Lua developer, I prefer Closure approach:

  • It promotes composition
  • Easy to understand - no implicit self var
  • More importantly, it's unambiguous to use - no one needs to think whether you access something by dot or colon

Again, I'm casual Lua developer. Had I spent more time within I assume my brain would adjust to things like implicit self and may be my self-recommendation would change.

For pure speed metatable is the way, though I wonder what difference it will make in real application (time your assumptions).

Out of curiosity, I did similar tests in Python (where there is one sane way to write this code). The results were surprising:

CPython3.7


Benchmarking init
Func:      18,378,052 ops/sec
Class:      4,760,040 ops/sec
Closure:    2,825,914 ops/sec
Benchmarking init+ivnoke
Class:      1,742,217 ops/sec
Closure:    1,549,709 ops/sec

PyPy3.6-7.1.1:


Benchmarking init
Func:   1,076,386,157 ops/sec
Class:    247,935,234 ops/sec
Closure:  189,527,406 ops/sec
Benchmarking init+invoke
Class:  1,073,107,020 ops/sec
Closure:  175,466,657 ops/sec

On CPython if you want to do anything with your classes beside initializing them, there is no much difference between Class and Closure. "Func" aside, its performance is on par with Lua.

PyPy just shines - its JIT outperforms Lua JIT by a far cry. The fact that speed of init+invoke on Class is similar to raw Func benchmark tells something about their ability to trace code that does nothing :)

On the emotional side

Don't believe benchmarks - lies, damn lies, and benchmarks :)

Seriously though, before thinking "why didn't they embed Python", other aspects should be contemplated:

  • Memory. Lua uses much less of it. Array of 10 million strings 10 byte each weighs 400mb in Lua while 700+mb in CPython/PyPy.
  • Python was a synchronous language originally with async support introduced much later. Nginx is an async server, hence Lua fits there more naturally, but I'm speculating here.
  • Everyone says that Lua is much easier to embed.
Finally, both can do amazing things through FFI.

No comments:

Post a Comment