Home About Eric Topics SourceGear

2019-04-30 16:00:00

Exploring wasm2cil performance

(What is wasm2cil? See my previous blog entry.)

I've been trying to do some measurements to get a rough idea of performance for assemblies produced by wasm2cil.

A VERY rough idea. Take these numbers with a grain of salt.

Brotli compression

For a rather CPU-intensive test, the following Brotli library:

https://github.com/dropbox/rust-brotli

... makes a nice test case because it compiles to wasm32-unknown-wasi with no problems:

erics-mac-mini:rust-brotli eric$ cargo +nightly build --release --target=wasm32-unknown-wasi

I made a small code change to write out the elapsed time after compressing something.

My test case is measure the time required to compress sqlite3.c (the source code for SQLite in a single 7.4 MB "amalgamation" file).

First, I ran the original Rust version natively:

erics-mac-mini:rust-brotli eric$ cargo run --release --bin brotli -- -c sqlite3.c s1
    Finished release [optimized] target(s) in 0.19s
     Running `target/release/brotli -c sqlite3.c s1`
elapsed: 17

Second, I ran brotli.wasm through wasmtime:

erics-mac-mini:wasm2cil eric$ wasmtime --dir=. brotli.wasm -- -c sqlite3.c s1
elapsed: 81

Finally, I ran brotli.wasm through my own wasm2cil:

erics-mac-mini:tool eric$ dotnet run -- run ../brotli.wasm -- -c ../sqlite3.c s1
elapsed: 37

Summarized results:

Implementation Elapsed Time (sec)
native 17
wasmtime 81
wasm2cil 37

So the wasm2cil version takes over twice as long as the native code. Hopefully I can narrow that gap with more focused work on performance.

It's a little surprising that wasmtime came in so much slower. Or maybe it's not. The comparison here has a little bit of apples-to-oranges going on.

First of all, WebAssembly memory accesses are supposed to be range-checked, and wasm2cil isn't currently doing that, while wasmtime probably is.

Second, we're probably seeing a basic difference in maturity between Cranelift JIT (which is fairly young) and .NET Core 2.2.

Third, I could be using wasmtime incorrectly, although I did try the --optimize flag, and it didn't help.

Finally, this is just one benchmark. In the next one, wasmtime does very well indeed.

SQLite

To do some measurements with SQLite, I wrote a little C program that uses the sqlite3 API to construct a table and query a subset of the rows. Specifically, given parameters (count), (first) and (last), it inserts (count) rows, where each row is two columns, the loop index, and the loop index squared. After the inserts are done, it does a SELECT sum() on the squares column using (first) and (last) as the range of rows.

In other words, the test program uses SQLite to calculate something basically equivalent to this:

int total = 0;
for (int i=0; i<count; i++)
{
    if ((i >= first) && (i <= last))
    {
        total += i * i;
    }
}

The main() for this test case allows the count, first and last parameters to be specified on the command line, as well as the name of the sqlite database file. So I did three runs:

Here are those runs for the native code:

erics-mac-mini:reference eric$ ./a.out z1 1000 200 400
filename=z1  count=1000  first=200  last=400
elapsed: 472 ms
rc: 18766700

erics-mac-mini:reference eric$ ./a.out z2 10000 2000 4000
filename=z2  count=10000  first=2000  last=4000
elapsed: 4618 ms
rc: 1496797816

erics-mac-mini:reference eric$ ./a.out z3 100000 20000 40000
filename=z3  count=100000  first=20000  last=40000
elapsed: 46668 ms
rc: 1738801584

And for wasmtime:

erics-mac-mini:sqlite3 eric$ wasmtime --dir=. sqlite3.wasm -- w1 1000 200 400
filename=w1  count=1000  first=200  last=400
elapsed: 534 ms
rc: 18766700

erics-mac-mini:sqlite3 eric$ wasmtime --dir=. sqlite3.wasm -- w2 10000 2000 4000
filename=w2  count=10000  first=2000  last=4000
elapsed: 5292 ms
rc: 1496797816

erics-mac-mini:sqlite3 eric$ wasmtime --dir=. sqlite3.wasm -- w3 100000 20000 40000
filename=w3  count=100000  first=20000  last=40000
elapsed: 53024 ms
rc: 1738801584

And for wasm2cil:

erics-mac-mini:tool eric$ dotnet run -- run ../sqlite3/sqlite3.wasm -- c1 1000 200 400
filename=c1  count=1000  first=200  last=400
elapsed: 1860 ms
rc: 18766700

erics-mac-mini:tool eric$ dotnet run -- run ../sqlite3/sqlite3.wasm -- c2 10000 2000 4000
filename=c2  count=10000  first=2000  last=4000
elapsed: 5411 ms
rc: 1496797816

erics-mac-mini:tool eric$ dotnet run -- run ../sqlite3/sqlite3.wasm -- c3 100000 20000 40000
filename=c3  count=100000  first=20000  last=40000
elapsed: 42700 ms
rc: 1738801584

The printed sum result is the same in each case, and the resulting sqlite database files are byte-for-byte identical within each run.

To summarize the results, as elapsed time for each run, in milliseconds:

Implementation 1,000 rows 10,000 rows 100,000 rows
native 472 4,618 46,668
wasmtime 534 5,292 53,024
wasm2cil 1,860 5,411 42,700

Weird.

The results for wasmtime are consistently about 13% higher than native, which is pretty impressive I think.

But the results for wasm2cil are all over the place:

It seems bizarre for wasm2cil to be faster than the native code for ANY test, so for now I'm going to assume this is a defect in my approach. I did repeat the runs and got similar results each time. So this is an interesting mystery.

Bottom line

There is no bottom line. Not yet.

Well actually, I consider it good news that these test runs work (without crashing or incorrect results), and that timing comparisons are decent (in the same order of magnitude as native).

Beyond that level of precision, I consider this data interesting as long as I'm not drawing big-picture conclusions from it. These results raise more questions than they answer. Mostly I can use the information to guide my efforts to improve wasm2cil.