Speed Benchmarks
Monocypher ships with a couple benchmarks. Run them on your platform if you’re not sure Monocypher is fast enough. There are also benchmarks for libsodium, TweetNaCl, and LibHydrogen so you can compare.
All results are presented in megabytes per second, or in operations per second (“13.5K” means 13500 operations per second). To avoid a false sense of accuracy, most reported numbers are rounded to two significant digits.
Overview
The following test the speed of Monocypher, libsodium, TweetNaCl,
LibHydrogen, and c25519 on my 64-bit Skylake core i5 Intel CPU, running
Ubuntu 18.04. Everything is compiled with Ubuntu’s GCC 7.4.0. libsodium
is compiled with the default options, as recommended by the installation
page. Everything else uses -O3 -march=native
.
+------+------+------+------+------+-------+
x86 | AEAD | Hash | Pw | Key | Sig | Check |
64 | | | hash | exch | | |
+-------------+------+------+------+------+------+-------+
| Monocypher | 307 | 683 | 511 | 8100 | 14K | 6000 |
| libsodium | 1000 | 870 | 701 | 21K | 33K | 13K |
| TweetNaCl | 51 | 40 | | 1800 | 650 | 330 |
| LibHydrogen | 94 | 162 | | | 9200 | 5500 |
+-------------+------+------+------+------+------+-------+
The same speeds, relative to Monocypher:
+------+------+------+------+------+-------+
x86 | AEAD | Hash | Pw | Key | Sig | Check |
64 | | | hash | exch | | |
+-------------+------+------+------+------+------+-------+
| Monocypher | 307 | 683 | 511 | 8100 | 14K | 6000 |
| libsodium | ×3.4 | ×1.3 | ×1.7 | ×2.5 | ×2.3 | ×2.2 |
| TweetNaCl | ÷6.0 | ÷17 | | ÷4.5 | ÷22 | ÷19 |
| LibHydrogen | ÷3.3 | ÷4.2 | | | ÷1.6 | ÷1.1 |
+-------------+------+------+------+------+------+-------+
Unsurprisingly, libsodium is the fastest of them all, thanks to its use of vector instructions and 128-bit arithmetic. If you want speed on desktops and servers, this is the one.
Despite restricting itself to portable C code, Monocypher does not lag too far behind. Authenticated encryption can’t keep up with libsodium’s excellent vector implementation (from Dolbeau), but the rest is more tolerable.
TweetNaCl is almost exclusively optimised for source code size. Performance wasn’t a consideration, so its slow speed is not surprising. Note that part of the poor performance of hashing comes from using SHA-512, which is slower than BLAKE2b.
LibHydrogen is mostly meant for constrained environments, but has been included here anyway, because its incompatibility with libsodium means that using it on IoT likely means using it on the server as well. Its relatively poor performance on symmetric crypto is mostly explained by the choice of the Gimli permutation, which is slower than RAX designs like Chacha20 and BLAKE2b when implemented in software (Hardware implementations are more efficient).
Effect of compilation options
The -O3 -march=native
flags are most aggressive. Not
everyone approves of these. Here’s how changing flags affect
Monocypher:
+------+------+------+------+------+-------+
x86 | AEAD | Hash | Pw | Key | Sig | Check |
64 | | | hash | exch | | |
+-------------+------+------+------+------+------+-------+
| -03 -native | 307 | 683 | 511 | 8100 | 14K | 6000 |
| -O3 | 98% | 95% | 87% | 99% | 99% | 98% |
| -O2 | 90% | 84% | 72% | 94% | 85% | 92% |
| -Os | 76% | 67% | 66% | 93% | 81% | 91% |
+-------------+------+------+------+------+------+-------+
(Note: if -Os is used with -DBLAKE2_NO_UNROLLING
to
reduce BLAKE2 code size even further, BLAKE2 performance drops to
57%)
Sticking to portable instructions has almost no effect, and optimising for size is mostly tolerable. Be careful about password hashing though: if it runs slower, people will lower its security to compensate.
R-Pi overview (based on old benchmarks)
This comparison uses Monocypher 2.0.0 and libsodium 1.0.16. They should be redone.
+--------+--------+--------+-------+--------+
R-pi | AEAD | Hash | Pw | Key | Sig |
| | | hash | exch | |
+------------+--------+--------+--------+-------+--------+
| Monocypher | 32MB/s | 26MB/s | 19MB/s | 680/s | 1310/s |
| libsodium | 156% | 100% | 100% | 101% | 130% |
| TweetNaCl | 22% | 42% | | 11% | 3% |
+------------+--------+--------+--------+-------+--------+
Third party benchmarks
The following paper by Zandberg & al compares various cryptographic libraries in a constrained environment. They concentrate on firmware updates, for which signature verification is often a bottleneck.
They report the following:
- The time it takes to verify a signature (seconds),
- The stack size (kilobytes),
- and the binary size on the flash (kilobytes).
They evaluated version 2.0.5 of Monocypher (version 2.0.6 performs the same but uses less stack). Numbers are rounded for readability, see the paper for the raw data. Smaller is better.
+------------+-------+--------+
Cortex M0+ | Signature | Stack | Binary |
| check time | size | size |
+---------+-------------+------------+-------+--------+
| | Monocypher | .53s | 5.2kb | 13kb |
| | HACL* | 7.1s | 3.2kb | 17kb |
| | TweetNaCl | 8.0s | 3.8kb | 5.6kb |
| Ed25519 | uNaCl | 8.1s | 3.8kb | 5.6kb |
| | C25519 | 4.2s | .98kb | 4.6kb |
| | WolfSSL | 3.7s | 1.3kb | 5.7kb |
+---------+-------------+------------+-------+--------+
| P256r1 | TinyCrypt | 1.1s | .60kb | 5.0kb |
| | Mbed TLS | 1.6s | .79kb | 17kb |
+---------+-------------+------------+-------+--------+
| Others | qDSA | 0.13s | .49kb | 15kb |
| | LibHydrogen | 1.1s | .49kb | 2.2kb |
+---------+-------------+------------+-------+--------+
+------------+-------+--------+
Cortex M3 | Signature | Stack | Binary |
| check time | size | size |
+---------+-------------+------------+-------+--------+
| | Monocypher | .072s | 5.1kb | 10kb |
| | HACL* | 1.5s | 3.3kb | 19kb |
| | TweetNaCl | 2.0s | 3.8kb | 5.6kb |
| Ed25519 | uNaCl | 1.8s | 3.8kb | 5.5kb |
| | C25519 | 3.3s | 1.0kb | 4.8kb |
| | WolfSSL | 2.7s | 1.3kb | 5.9kb |
+---------+-------------+------------+-------+--------+
| P256r1 | TinyCrypt | .44s | .68kb | 4.9kb |
| | Mbed TLS | 1.1s | .80kb | 15kb |
+---------+-------------+------------+-------+--------+
| Others | qDSA | 1.9s | .79kb | 12kb |
| | LibHydrogen | .22s | .47kb | 2.2kb |
+---------+-------------+------------+-------+--------+
+------------+-------+--------+
Cortex M4 | Signature | Stack | Binary |
| check time | size | size |
+---------+-------------+------------+-------+--------+
| | Monocypher | .045s | 5.1kb | 10kb |
| | HACL* | 1.3s | 3.3kb | 19kb |
| | TweetNaCl | 1.5s | 3.8kb | 5.6kb |
| Ed25519 | uNaCl | 1.5s | 3.8kb | 5.5kb |
| | C25519 | 1.9s | 1.0kb | 4.8kb |
| | WolfSSL | 1.7s | 1.3kb | 5.9kb |
+---------+-------------+------------+-------+--------+
| P256r1 | TinyCrypt | .35s | .66kb | 4.9kb |
| | Mbed TLS | .84s | .80kb | 15kb |
+---------+-------------+------------+-------+--------+
| Others | qDSA | 1.3s | .97kb | 12kb |
| | LibHydrogen | .24s | .44kb | 2.2kb |
+---------+-------------+------------+-------+--------+
A relative comparison gives a better sense of scale:
+------------+-------+--------+
Cortex M0+ | Signature | Stack | Binary |
| check time | size | size |
+---------+-------------+------------+-------+--------+
| | Monocypher | 530ms | 5200 | 13000 |
| | HACL* | ×13 | ÷1.6 | ×1.3 |
| | TweetNaCl | ×15 | ÷1.4 | ÷2.3 |
| Ed25519 | uNaCl | ×15 | ÷1.4 | ÷2.3 |
| | C25519 | ×7.9 | ÷5.3 | ÷2.7 |
| | WolfSSL | ×6.9 | ÷4.0 | ÷2.2 |
+---------+-------------+------------+-------+--------+
| P256r1 | TinyCrypt | ×2.2 | ÷8.6 | ÷2.5 |
| | Mbed TLS | ×2.9 | ÷6.6 | ×1.3 |
+---------+-------------+------------+-------+--------+
| Others | qDSA | ÷3.9 | ÷11 | ×1.2 |
| | LibHydrogen | ×2.0 | ÷11 | ÷5.7 |
+---------+-------------+------------+-------+--------+
+------------+-------+--------+
Cortex M3 | Signature | Stack | Binary |
| check time | size | size |
+---------+-------------+------------+-------+--------+
| | Monocypher | 72ms | 5088 | 10334 |
| | HACL* | ×21 | ÷1.6 | ×1.8 |
| | TweetNaCl | ×27 | ÷1.3 | ÷1.9 |
| Ed25519 | uNaCl | ×25 | ÷1.3 | ÷1.9 |
| | C25519 | ×46 | ÷4.9 | ÷2.1 |
| | WolfSSL | ×37 | ÷3.8 | ÷1.7 |
+---------+-------------+------------+-------+--------+
| P256r1 | TinyCrypt | ×6.1 | ÷7.5 | ÷2.1 |
| | Mbed TLS | ×16 | ÷6.4 | ×1.5 |
+---------+-------------+------------+-------+--------+
| Others | qDSA | ×27 | ÷6.4 | ×1.2 |
| | LibHydrogen | ×3.0 | ÷11 | ÷4.7 |
+---------+-------------+------------+-------+--------+
+------------+-------+--------+
Cortex M4 | Signature | Stack | Binary |
| check time | size | size |
+---------+-------------+------------+-------+--------+
| | Monocypher | 45ms | 5088 | 10358 |
| | HACL* | ×28 | ÷1.6 | ×1.8 |
| | TweetNaCl | ×32 | ÷1.4 | ÷1.9 |
| Ed25519 | uNaCl | ×33 | ÷1.4 | ÷1.9 |
| | C25519 | ×43 | ÷5.0 | ÷2.1 |
| | WolfSSL | ×38 | ÷3.8 | ÷1.8 |
+---------+-------------+------------+-------+--------+
| P256r1 | TinyCrypt | ×7.7 | ÷7.7 | ÷2.1 |
| | Mbed TLS | ×19 | ÷6.4 | ×1.5 |
+---------+-------------+------------+-------+--------+
| Others | qDSA | ×29 | ÷5.2 | ×1.2 |
| | LibHydrogen | ×5.3 | ÷12 | ÷4.8 |
+---------+-------------+------------+-------+--------+
Monocypher is fast. Among all tested libraries, the only thing that outperforms it is qDSA on Cortex M0+, because it uses hand optimised assembly. And if we limit ourselves to Ed25519 (so we can use libsodium on the server side), Monocypher blows everything out of the water.
On the other hand, Monocypher is also a bit bloated. The binary tends to lean on the bigger size, and its 5KB stack is the tallest of them all. This problem was partially addressed in version 2.0.6, which reduced stack usage down to about 3KB, without losing any performance.
Monocypher won’t fit on every embedded platform¹. But when it does, it’s a speed demon. And it can talk to libsodium, which is even faster on the server.
(1) use -DBLAKE2_NO_UNROLLING
to reduce code size.
It may even run faster on small processors.
Raw data
Monocypher 2.0.6 (core i5 Skylake, Ubuntu 16.04)
Compiled with -O3 -march=native
Chacha20 : 410 megabytes per second
Poly1305 : 1218 megabytes per second
Auth'd encryption: 307 megabytes per second
BLAKE2b : 683 megabytes per second
SHA-512 : 302 megabytes per second
Argon2i, 3 passes: 511 megabytes per second
x25519 : 8124 exchanges per second
EdDSA(sign) : 14418 signatures per second
EdDSA(check) : 6091 checks per second
Compiled with -O3
Chacha20 : 402 megabytes per second
Poly1305 : 1202 megabytes per second
Auth'd encryption: 301 megabytes per second
BLAKE2b : 651 megabytes per second
SHA-512 : 248 megabytes per second
Argon2i, 3 passes: 445 megabytes per second
x25519 : 8008 exchanges per second
EdDSA(sign) : 14292 signatures per second
EdDSA(check) : 5964 checks per second
Compiled with -O2
Chacha20 : 372 megabytes per second
Poly1305 : 1089 megabytes per second
Auth'd encryption: 277 megabytes per second
BLAKE2b : 579 megabytes per second
SHA-512 : 240 megabytes per second
Argon2i, 3 passes: 368 megabytes per second
x25519 : 7642 exchanges per second
EdDSA(sign) : 12249 signatures per second
EdDSA(check) : 5616 checks per second
Compiled with -Os
Chacha20 : 317 megabytes per second
Poly1305 : 915 megabytes per second
Auth'd encryption: 235 megabytes per second
BLAKE2b : 462 megabytes per second
SHA-512 : 245 megabytes per second
Argon2i, 3 passes: 337 megabytes per second
x25519 : 7589 exchanges per second
EdDSA(sign) : 11648 signatures per second
EdDSA(check) : 5528 checks per second
libsodium 1.0.18 (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)
Compiled with default options:
$ ./configure
$ make && make check
$ sudo make install
Chacha20 : 1900 megabytes per second
Poly1305 : 2337 megabytes per second
Auth'd encryption: 1048 megabytes per second
BLAKE2b : 870 megabytes per second
SHA-512 : 296 megabytes per second
Argon2i, 3 passes: 701 megabytes per second
x25519 : 20688 exchanges per second
EdDSA(sign) : 32899 signatures per second
EdDSA(check) : 13208 checks per second
TweetNaCl (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)
Compiled with -O3 -march=native
Salsa20 : 202 megabytes per second
Poly1305 : 69 megabytes per second
Auth'd encryption: 51 megabytes per second
SHA-512 : 40 megabytes per second
x25519 : 1797 exchanges per second
EdDSA(sign) : 648 signatures per second
EdDSA(check) : 325 checks per second
LibHydrogen (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)
No packaged release as of 2019/10. Used git commit f1f061d 2019-10-02.
Compiled with -O3 -march=native
(the default is
-Os -march=native
).
Random : 200 megabytes per second
Auth'd encryption: 94 megabytes per second
Hash : 162 megabytes per second
sign : 9233 signatures per second
check : 5513 checks per second
Monocypher 2.0.0 (Raspberry-Pi, model 3B)
(Note: EdDSA performance roughly doubled between 2.0.0 and 2.0.6.)
Compiled with -O3 -march=native
Chacha20 : 63 megabytes per second
Poly1305 : 67 megabytes per second
Auth'd encryption: 32 megabytes per second
BLAKE2b : 26 megabytes per second
SHA-512 : 13 megabytes per second
Argon2i, 3 passes: 19 megabytes per second
x25519 : 679 exchanges per second
EdDSA(sign) : 1311 signatures per second
EdDSA(check) : 514 checks per second
libsodium 1.0.16. (Raspberry-Pi, model 3B)
Compiled with default flags.
Chacha20 : 72 megabytes per second
Poly1305 : 166 megabytes per second
Auth'd encryption: 50 megabytes per second
BLAKE2b : 26 megabytes per second
SHA-512 : 11 megabytes per second
Argon2i, 3 passes: 19 megabytes per second
x25519 : 686 exchanges per second
EdDSA(sign) : 1702 signatures per second
EdDSA(check) : 618 checks per second
TweetNaCl (Raspberry-Pi, model 3B )
Compiled with -O3 march=native
Salsa20 : 64 megabytes per second
Poly1305 : 9 megabytes per second
Auth'd encryption: 7 megabytes per second
SHA-512 : 11 megabytes per second
x25519 : 78 exchanges per second
EdDSA(sign) : 44 signatures per second
EdDSA(check) : 22 checks per second