Monocypher

Boring crypto that simply works

Speed Benchmarks

Monocypher ships with a couple benchmarks. Run them on your platform if you're not sure Monocypher is fast enough. There are also benchmarks for Libsodium, TweetNaCl, and Libhydrogen so you can compare.

All results are presented in megabytes per second, or in operations per second ("13.5K" means 13500 operations per second). To avoid a false sense of accuracy, most reported numbers are rounded to two significant digits.

Overview

The following test the speed of Monocypher, Libsodium, TweetNaCl, Libhydrogen, and c25519 on my 64-bit Skylake core i5 Intel CPU, running Ubuntu 18.04. Everything is compiled with Ubuntu's GCC 7.4.0. Libsodium is compiled with the default options, as recommended by the installation page. Everything else uses -O3 -march=native.

              +------+------+------+------+------+-------+
      x86     | AEAD | Hash | Pw   | Key  | Sig  | Check |
      64      |      |      | hash | exch |      |       |
+-------------+------+------+------+------+------+-------+
| Monocypher  |  307 |  683 |  511 | 8100 |  14K |  6000 |
| Libsodium   | 1000 |  870 |  701 |  21K |  33K |   13K |
| TweetNaCl   |   51 |   40 |      | 1800 |  650 |   330 |
| Libhydrogen |   94 |  162 |      |      | 9200 |  5500 |
+-------------+------+------+------+------+------+-------+

The same speeds, relative to Monocypher:

              +------+------+------+------+------+-------+
      x86     | AEAD | Hash | Pw   | Key  | Sig  | Check |
      64      |      |      | hash | exch |      |       |
+-------------+------+------+------+------+------+-------+
| Monocypher  |  307 |  683 |  511 | 8100 |  14K |  6000 |
| Libsodium   | ×3.4 | ×1.3 | ×1.7 | ×2.5 | ×2.3 |  ×2.2 |
| TweetNaCl   | ÷6.0 |  ÷17 |      | ÷4.5 |  ÷22 |   ÷19 |
| Libhydrogen | ÷3.3 | ÷4.2 |      |      | ÷1.6 |  ÷1.1 |
+-------------+------+------+------+------+------+-------+

Unsurprisingly, Libsodium is the fastest of them all, thanks to its use of vector instructions and 128-bit arithmetic. If you want speed on desktops and servers, this is the one.

Despite restricting itself to portable C code, Monocypher does not lag too far behind. Authenticated encryption can't keep up with Libsodium's excellent vector implementation (from Dolbeau), but the rest is more tolerable.

TweetNaCl is almost exclusively optimised for source code size. Performance wasn't a consideration, so its slow speed is not surprising. Note that part of the poor performance of hashing comes from using SHA-512, which is slower than Blake2b.

Libhydrogen is mostly meant for constrained environments, but has been included here anyway, because its incompatibility with Libsodium means that using it on IoT likely means using it on the server as well. Its relatively poor performance on symmetric crypto is mostly explained by the choice of the Gimli permutation, which is slower than RAX designs like Chacha20 and Blake2b when implemented in software (Hardware implementations are more efficient).

Effect of compilation options

The -O3 -march=native flags are most aggressive. Not everyone approves of these. Here's how changing flags affect Monocypher:

              +------+------+------+------+------+-------+
      x86     | AEAD | Hash | Pw   | Key  | Sig  | Check |
      64      |      |      | hash | exch |      |       |
+-------------+------+------+------+------+------+-------+
| -03 -native |  307 |  683 |  511 | 8100 |  14K |  6000 |
| -O3         |  98% |  95% |  87% |  99% |  99% |   98% |
| -O2         |  90% |  84% |  72% |  94% |  85% |   92% |
| -Os         |  76% |  67% |  66% |  93% |  81% |   91% |
+-------------+------+------+------+------+------+-------+

(Note: if -Os is used with -DBLAKE2_NO_UNROLLING to reduce Blake2 code size even further, Blake2 performance drops to 57%)

Sticking to portable instructions have almost no effect, and optimising for size is mostly tolerable. Be careful about password hashing though: if it runs slower, people will lower its security to compensate.

R-Pi overview (based on old benchmarks)

This comparison uses Monocypher 2.0.0 and Libsodium 1.0.16. They should be redone.

             +--------+--------+--------+-------+--------+
    R-pi     |  AEAD  |  Hash  |  Pw    | Key   |  Sig   |
             |        |        |  hash  | exch  |        |
+------------+--------+--------+--------+-------+--------+
| Monocypher | 32MB/s | 26MB/s | 19MB/s | 680/s | 1310/s |
| Libsodium  |  156%  |  100%  |  100%  | 101%  |  130%  |
| TweetNaCl  |   22%  |   42%  |        |  11%  |    3%  |
+------------+--------+--------+--------+-------+--------+

Third party benchmarks

The following paper by Koen Zandberg & al compares various cryptographic libraries in a constrained environment. They concentrate on firmware updates, for which signature verification is often a bottleneck.

They report the following:

They evaluated version 2.0.5 of Monocypher (version 2.0.6 performs the same, but uses less stack). Numbers are rounded for readability, see the paper for the raw data. Smaller is better.

                        +------------+-------+--------+
      Cortex M0+        | Signature  | Stack | Binary |
                        | check time | size  | size   |
+---------+-------------+------------+-------+--------+
|         | Monocypher  |     .53s   | 5.2kb |   13kb |
|         | HACL*       |     7.1s   | 3.2kb |   17kb |
|         | TweetNaCl   |     8.0s   | 3.8kb |  5.6kb |
| Ed25519 | uNaCl       |     8.1s   | 3.8kb |  5.6kb |
|         | C25519      |     4.2s   | .98kb |  4.6kb |
|         | WolfSSL     |     3.7s   | 1.3kb |  5.7kb |
+---------+-------------+------------+-------+--------+
| P256r1  | TinyCrypt   |     1.1s   | .60kb |  5.0kb |
|         | Mbed TLS    |     1.6s   | .79kb |   17kb |
+---------+-------------+------------+-------+--------+
| Others  | qDSA        |    0.13s   | .49kb |   15kb |
|         | Libhydrogen |     1.1s   | .49kb |  2.2kb |
+---------+-------------+------------+-------+--------+


                        +------------+-------+--------+
      Cortex M3         | Signature  | Stack | Binary |
                        | check time | size  | size   |
+---------+-------------+------------+-------+--------+
|         | Monocypher  |    .072s   | 5.1kb |   10kb |
|         | HACL*       |     1.5s   | 3.3kb |   19kb |
|         | TweetNaCl   |     2.0s   | 3.8kb |  5.6kb |
| Ed25519 | uNaCl       |     1.8s   | 3.8kb |  5.5kb |
|         | C25519      |     3.3s   | 1.0kb |  4.8kb |
|         | WolfSSL     |     2.7s   | 1.3kb |  5.9kb |
+---------+-------------+------------+-------+--------+
| P256r1  | TinyCrypt   |     .44s   | .68kb |  4.9kb |
|         | Mbed TLS    |     1.1s   | .80kb |   15kb |
+---------+-------------+------------+-------+--------+
| Others  | qDSA        |     1.9s   | .79kb |   12kb |
|         | Libhydrogen |     .22s   | .47kb |  2.2kb |
+---------+-------------+------------+-------+--------+


                        +------------+-------+--------+
      Cortex M4         | Signature  | Stack | Binary |
                        | check time | size  | size   |
+---------+-------------+------------+-------+--------+
|         | Monocypher  |    .045s   | 5.1kb |   10kb |
|         | HACL*       |     1.3s   | 3.3kb |   19kb |
|         | TweetNaCl   |     1.5s   | 3.8kb |  5.6kb |
| Ed25519 | uNaCl       |     1.5s   | 3.8kb |  5.5kb |
|         | C25519      |     1.9s   | 1.0kb |  4.8kb |
|         | WolfSSL     |     1.7s   | 1.3kb |  5.9kb |
+---------+-------------+------------+-------+--------+
| P256r1  | TinyCrypt   |     .35s   | .66kb |  4.9kb |
|         | Mbed TLS    |     .84s   | .80kb |   15kb |
+---------+-------------+------------+-------+--------+
| Others  | qDSA        |     1.3s   | .97kb |   12kb |
|         | Libhydrogen |     .24s   | .44kb |  2.2kb |
+---------+-------------+------------+-------+--------+

A relative comparison gives a better sense of scale:

                        +------------+-------+--------+
      Cortex M0+        | Signature  | Stack | Binary |
                        | check time | size  | size   |
+---------+-------------+------------+-------+--------+
|         | Monocypher  |    530ms   |  5200 |  13000 |
|         | HACL*       |      ×13   |  ÷1.6 |   ×1.3 |
|         | TweetNaCl   |      ×15   |  ÷1.4 |   ÷2.3 |
| Ed25519 | uNaCl       |      ×15   |  ÷1.4 |   ÷2.3 |
|         | C25519      |     ×7.9   |  ÷5.3 |   ÷2.7 |
|         | WolfSSL     |     ×6.9   |  ÷4.0 |   ÷2.2 |
+---------+-------------+------------+-------+--------+
| P256r1  | TinyCrypt   |     ×2.2   |  ÷8.6 |   ÷2.5 |
|         | Mbed TLS    |     ×2.9   |  ÷6.6 |   ×1.3 |
+---------+-------------+------------+-------+--------+
| Others  | qDSA        |     ÷3.9   |  ÷11  |   ×1.2 |
|         | Libhydrogen |     ×2.0   |  ÷11  |   ÷5.7 |
+---------+-------------+------------+-------+--------+


                        +------------+-------+--------+
      Cortex M3         | Signature  | Stack | Binary |
                        | check time | size  | size   |
+---------+-------------+------------+-------+--------+
|         | Monocypher  |     72ms   |  5088 |  10334 |
|         | HACL*       |      ×21   |  ÷1.6 |   ×1.8 |
|         | TweetNaCl   |      ×27   |  ÷1.3 |   ÷1.9 |
| Ed25519 | uNaCl       |      ×25   |  ÷1.3 |   ÷1.9 |
|         | C25519      |      ×46   |  ÷4.9 |   ÷2.1 |
|         | WolfSSL     |      ×37   |  ÷3.8 |   ÷1.7 |
+---------+-------------+------------+-------+--------+
| P256r1  | TinyCrypt   |     ×6.1   |  ÷7.5 |   ÷2.1 |
|         | Mbed TLS    |      ×16   |  ÷6.4 |   ×1.5 |
+---------+-------------+------------+-------+--------+
| Others  | qDSA        |      ×27   |  ÷6.4 |   ×1.2 |
|         | Libhydrogen |     ×3.0   |   ÷11 |   ÷4.7 |
+---------+-------------+------------+-------+--------+


                        +------------+-------+--------+
      Cortex M4         | Signature  | Stack | Binary |
                        | check time | size  | size   |
+---------+-------------+------------+-------+--------+
|         | Monocypher  |     45ms   |  5088 |  10358 |
|         | HACL*       |      ×28   |  ÷1.6 |   ×1.8 |
|         | TweetNaCl   |      ×32   |  ÷1.4 |   ÷1.9 |
| Ed25519 | uNaCl       |      ×33   |  ÷1.4 |   ÷1.9 |
|         | C25519      |      ×43   |  ÷5.0 |   ÷2.1 |
|         | WolfSSL     |      ×38   |  ÷3.8 |   ÷1.8 |
+---------+-------------+------------+-------+--------+
| P256r1  | TinyCrypt   |     ×7.7   |  ÷7.7 |   ÷2.1 |
|         | Mbed TLS    |      ×19   |  ÷6.4 |   ×1.5 |
+---------+-------------+------------+-------+--------+
| Others  | qDSA        |      ×29   |  ÷5.2 |   ×1.2 |
|         | Libhydrogen |     ×5.3   |   ÷12 |   ÷4.8 |
+---------+-------------+------------+-------+--------+

Monocypher is fast. Among all tested libraries, the only thing that outperforms it is qDSA on Cortex M0+, because it uses hand optimised assembly. And if we limit ourselves to Ed25519 (so we can use Libsodium on the server side), Monocypher blows everything out of the water.

On the other hand, Monocypher is also a bit bloated. The binary tends to lean on the bigger size, and its 5KB stack is the tallest of them all. This problem was partially addressed in version 2.0.6, which reduced stack usage down to about 3KB, without losing any performance.

Monocypher won't fit on every embedded platform¹. But when it does, it's a speed demon. And it can talk to Libsodium, which is even faster on the server.

(1) use -DBLAKE2_NO_UNROLLING to reduce code size. It may even run faster on small processors.

Raw data

Monocypher 2.0.6 (core i5 Skylake, Ubuntu 16.04)

Compiled with -O3 -march=native

Chacha20         :   410 megabytes  per second
Poly1305         :  1218 megabytes  per second
Auth'd encryption:   307 megabytes  per second
Blake2b          :   683 megabytes  per second
Sha512           :   302 megabytes  per second
Argon2i, 3 passes:   511 megabytes  per second
x25519           :  8124 exchanges  per second
EdDSA(sign)      : 14418 signatures per second
EdDSA(check)     :  6091 checks     per second

Compiled with -O3

Chacha20         :   402 megabytes  per second
Poly1305         :  1202 megabytes  per second
Auth'd encryption:   301 megabytes  per second
Blake2b          :   651 megabytes  per second
Sha512           :   248 megabytes  per second
Argon2i, 3 passes:   445 megabytes  per second
x25519           :  8008 exchanges  per second
EdDSA(sign)      : 14292 signatures per second
EdDSA(check)     :  5964 checks     per second

Compiled with -O2

Chacha20         :   372 megabytes  per second
Poly1305         :  1089 megabytes  per second
Auth'd encryption:   277 megabytes  per second
Blake2b          :   579 megabytes  per second
Sha512           :   240 megabytes  per second
Argon2i, 3 passes:   368 megabytes  per second
x25519           :  7642 exchanges  per second
EdDSA(sign)      : 12249 signatures per second
EdDSA(check)     :  5616 checks     per second

Compiled with -Os

Chacha20         :   317 megabytes  per second
Poly1305         :   915 megabytes  per second
Auth'd encryption:   235 megabytes  per second
Blake2b          :   462 megabytes  per second
Sha512           :   245 megabytes  per second
Argon2i, 3 passes:   337 megabytes  per second
x25519           :  7589 exchanges  per second
EdDSA(sign)      : 11648 signatures per second
EdDSA(check)     :  5528 checks     per second

Libsodium 1.0.18 (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)

Compiled with default options:

$ ./configure
$ make && make check
$ sudo make install

Chacha20         :  1900 megabytes  per second
Poly1305         :  2337 megabytes  per second
Auth'd encryption:  1048 megabytes  per second
Blake2b          :   870 megabytes  per second
Sha512           :   296 megabytes  per second
Argon2i, 3 passes:   701 megabytes  per second
x25519           : 20688 exchanges  per second
EdDSA(sign)      : 32899 signatures per second
EdDSA(check)     : 13208 checks     per second

TweetNaCl (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)

Compiled with -O3 -march=native

Salsa20          :   202 megabytes  per second
Poly1305         :    69 megabytes  per second
Auth'd encryption:    51 megabytes  per second
Sha512           :    40 megabytes  per second
x25519           :  1797 exchanges  per second
EdDSA(sign)      :   648 signatures per second
EdDSA(check)     :   325 checks     per second

Libhydrogen (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)

No packaged release as of 2019/10. Used git commit f1f061d 2019-10-02.

Compiled with -O3 -march=native (the default is -Os -march=native).

Random           :   200 megabytes  per second
Auth'd encryption:    94 megabytes  per second
Hash             :   162 megabytes  per second
sign             :  9233 signatures per second
check            :  5513 checks     per second

Monocypher 2.0.0 (Raspberry-Pi, model 3B)

(Note: EdDSA performance roughly doubled between 2.0.0 and 2.0.6.)

Compiled with -O3 -march=native

Chacha20         :    63 megabytes  per second
Poly1305         :    67 megabytes  per second
Auth'd encryption:    32 megabytes  per second
Blake2b          :    26 megabytes  per second
SHA-512          :    13 megabytes  per second
Argon2i, 3 passes:    19 megabytes  per second
x25519           :   679 exchanges  per second
EdDSA(sign)      :  1311 signatures per second
EdDSA(check)     :   514 checks     per second

Libsodium 1.0.16. (Raspberry-Pi, model 3B)

Compiled with default flags.

Chacha20         :    72 megabytes  per second
Poly1305         :   166 megabytes  per second
Auth'd encryption:    50 megabytes  per second
Blake2b          :    26 megabytes  per second
SHA-512          :    11 megabytes  per second
Argon2i, 3 passes:    19 megabytes  per second
x25519           :   686 exchanges  per second
EdDSA(sign)      :  1702 signatures per second
EdDSA(check)     :   618 checks     per second

TweetNaCl (Raspberry-Pi, model 3B )

Compiled with -O3 march=native

Salsa20          :    64 megabytes  per second
Poly1305         :     9 megabytes  per second
Auth'd encryption:     7 megabytes  per second
SHA-512          :    11 megabytes  per second
x25519           :    78 exchanges  per second
EdDSA(sign)      :    44 signatures per second
EdDSA(check)     :    22 checks     per second