Monocypher

Boring crypto that simply works

Speed Benchmarks

For those who care about speed, Monocypher comes with a couple benchmarks. Run them on your platform if you're not sure this is fast enough. There are also benchmarks for Libsodium and TweetNaCl.

This page reports the results on my core i5 Skylake laptop (x86-64), and on a Raspberry pi model 3B (ARM core of an BCM2837). Everything is single threaded, and compiled with GCC. The versions measured here are Monocypher 2.0.5 and Libsodium 1.0.16.

To avoid a false sense of accuracy, most reported numbers are rounded to the nearest two significant digits. Absolute numbers are expressed in megabytes per second, exchanges per second, or signatures per second.

      +---------------+----------+----------+----------+-----------+
  x86 | Authenticated |   Hash   | Password | key      | Signature |
  64  | encryption    |          | hash     | Exchange |           |
+-----+---------------+----------+----------+----------+-----------+
| def |    ... MB/s   | ... MB/s | ... MB/s |  ...../s |  ...../s  |
| ASM |     ..%       |    ..%   |    ..%   |    ..%   |     ..%   |
| -O3 |     ..%       |    ..%   |    ..%   |    ..%   |     ..%   |
| -O2 |     ..%       |    ..%   |    ..%   |    ..%   |     ..%   |
| -Os |     ..%       |    ..%   |    ..%   |    ..%   |     ..%   |
+-----+---------------+----------+----------+----------+-----------+

The various lines mean slightly different compilation options for each library (the first two lines only make sense when Libsodium is shown).

      +------------------+--------------------+------------------+
      |    Monocypher    |      Libsodium     |     TweetNaCl    |
+-----+------------------+--------------------+------------------+
| ASM | -O3 march=native | --enable-opt       | -O3 march=native |
| def | -O3 march=native | --disable-asm      | -O3 march=native |
| -O3 | -O3 march=native | --disable-asm,     | -O3 march=native |
|     |                  |   -O3 march=native |                  |
| -O2 | -O2              | --disable-asm, -O2 | -O2              |
| -Os | -Os              | --disable-asm, -Os | -Os              |
+-----+------------------+--------------------+------------------+

The --enable-opt and --disable-asm options are set during Libsodium's configure step. The other options are set by overriding the CFLAGS variable during the make step. Note that the --disable-asm doesn't disable 128-bit arithmetic.

Effect of compilation options

Not everyone can afford, or even trust, -O3 -march=native. Here we measure the effect of compilation options. The baseline is -O3 -march=native. The other are compared to it. Libsodium has another line, ASM, where it takes advantage of non-portable implementations (typically compiler intrinsics).

Monocypher

      +---------------+----------+----------+----------+-----------+
  x86 | Authenticated |   Hash   | Password | key      | Signature |
  64  | encryption    |          | hash     | Exchange |           |
+-----+---------------+----------+----------+----------+-----------+
| -O3 |    300 MB/s   | 690 MB/s | 480 MB/s |  7900/s  |  14300/s  |
| -O2 |      91%      |    85%   |    73%   |    98%   |    86%    |
| -Os |      78%      |    66%   |    73%   |    96%   |    83%    |
+-----+---------------+----------+----------+----------+-----------+

      +---------------+----------+----------+----------+-----------+
 R-pi | Authenticated |   Hash   | Password | key      | Signature |
      | encryption    |          | hash     | Exchange |           |
+-----+---------------+----------+----------+----------+-----------+
| -O3 |    32 MB/s    |  26 MB/s |  19 MB/s |   680/s  |  1310/s   |
| -O2 |      97%      |   100%   |  105%    |    96%   |    87%    |
| -Os |      97%      |   120%   |  <1 MB/s |   114%   |    89%    |
+-----+---------------+----------+----------+----------+-----------+

Seems that -O3 sometimes make things worse. Also note the abysmal performance of Argon2i with -Os on the R-pi—I have no idea what causes this.

Libsodium

      +---------------+----------+----------+----------+-----------+
  x86 | Authenticated |   Hash   | Password | key      | Signature |
  64  | encryption    |          | hash     | Exchange |           |
+-----+---------------+----------+----------+----------+-----------+
| def |    300 MB/s   | 580 MB/s | 360 MB/s | 15000/s  |  33000/s  |
| ASM |     380%      |   136%   |   200%   |   133%   |    110%   |
| -O3 |      98%      |   104%   |   121%   |    99%   |    113%   |
| -O2 |     100%      |    99%   |    98%   |   100%   |    100%   |
| -Os |      86%      |    92%   |    68%   |    99%   |     91%   |
+-----+---------------+----------+----------+----------+-----------+

Don't trust signature and key exchange timings too much, they are less accurate than the other timings because of how freaking fast they are.

Non-portable implementations are much faster on x86-64. (Signature and key exchange use 128-bit arithmetic for all options here, so the differences are very small there.)

      +---------------+----------+----------+----------+-----------+
 R-pi | Authenticated |   Hash   | Password | key      | Signature |
      | encryption    |          | hash     | Exchange |           |
+-----+---------------+----------+----------+----------+-----------+
| def |    50 MB/s    |  26 MB/s |  19 MB/s |   680/s  |   1700/s  |
| ASM |     100%      |   100%   |    100%  |    99%   |    100%   |
+-----+---------------+----------+----------+----------+-----------+

I did not have the courage to test all build options for the R-pi. Compilation was too damn slow, and I didn't feel like setting up my first cross compilation tool-chain.

Libsodium doesn't seem to have dedicated optimisations for the R-pi.

TweetNaCl

Note that TweetNaCl uses Salsa20 and SHA-512. Monocypher and Libsodium use Chacha20 and Blake2b instead. While Salsa20 and Chacha20 are mostly comparable, Blake2b is much faster than SHA-512. This puts TweetNaCl at a disadvantage.

Also, TweetNaCl doesn't implement password hashing.

      +---------------+----------+----------+-----------+
  x86 | Authenticated |   Hash   | key      | Signature |
  64  | encryption    |          | Exchange |           |
+-----+---------------+----------+----------+-----------+
| -O3 |    60 MB/s    | 213 MB/s |  1739/s  |   646/s   |
| -O2 |      40%      |    40%   |    49%   |    78%    |
| -Os |      38%      |    38%   |    48%   |    77%    |
+-----+---------------+----------+----------+-----------+

      +---------------+----------+----------+-----------+
 R-pi | Authenticated |   Hash   | key      | Signature |
      | encryption    |          | Exchange |           |
+-----+---------------+----------+----------+-----------+
| -O3 |     7 MB/s    |  11 MB/s |   78/s   |    44/s   |
| -O2 |      40%      |    64%   |    92%   |    95%    |
| -Os |      40%      |    64%   |    74%   |    77%    |
+-----+---------------+----------+----------+-----------+

TweetNaCl is pretty slow, and very sensitive to compilation options. Especially Salsa20, though this is somewhat masked by Poly1305, which is slow everywhere.

Still, I wouldn't dismiss TweetNaCl out of hand. It is fast enough for most casual purposes.

Speed comparisons

The main course. We use Monocypher's speed as the baseline, at 100%.

Monocypher vs Libsodium

      +---------------+----------+----------+----------+-----------+
  x86 | Authenticated |   Hash   | Password | key      | Signature |
  64  | encryption    |          | hash     | Exchange |           |
+-----+---------------+----------+----------+----------+-----------+
| ASM |      380%     |   114%   |   151%   |   260%   |    250%   |
| def |      100%     |    84%   |    74%   |   200%   |    230%   |
| -O3 |       98%     |    88%   |    89%   |   220%   |    260%   |
| -O2 |      111%     |    98%   |    99%   |   200%   |    270%   |
| -Os |      112%     |   116%   |    69%   |   200%   |    250%   |
+-----+---------------+----------+----------+----------+-----------+

Libsodium is consistently fast on key exchanges and signatures because of 128-bit arithmetic, and bigger precomputed tables for signatures. Monocypher doesn't use 128-bit arithmetic to stay portable, and it avoids big precomputed tables to save code.

More interesting is the performance of Blake2b and Argon2i. Monocypher actually beats Libsodium's reference implementations. For Blake2b, this is because Monocypher forcibly unrolls the inner loop, which enables better constant propagation. This costs about 4Kb of generated code. For Argon2i, I'm not sure. I suspect the reference implementation performs extraneous copies and allocations.

Of course, the non-portable implementations are still a bit faster.

      +---------------+----------+----------+----------+-----------+
 R-pi | Authenticated |   Hash   | Password | key      | Signature |
      | encryption    |          | hash     | Exchange |           |
+-----+---------------+----------+----------+----------+-----------+
| ASM |      156%     |   100%   |    19%   |   101%   |    130%   |
| def |      156%     |   100%   |    19%   |   100%   |    129%   |
+-----+---------------+----------+----------+----------+-----------+

No more 128-bit arithmetic on the R-pi, so key exchange performs the same. Signatures still benefit from their bigger precomputed tables.

Monocypher lost its edge for Blake2b. I suspect the unrolled inner loop strains the instruction cache of the R-pi's smaller processor.

Libsodium's Poly1305 is much faster than Monocypher's. I have no idea why, I haven't seen any 32-bit specific implementation. This gives it a significant edge for authenticated encryption, though not nearly as impressive as x86-64.

Monocypher vs TweetNaCl

      +---------------+----------+----------+-----------+
  x86 | Authenticated |   Hash   | key      | Signature |
  64  | encryption    |          | Exchange |           |
+-----+---------------+----------+----------+-----------+
| -O3 |      20%      |    31%   |    22%   |     5%    |
| -O2 |       9%      |    14%   |    11%   |     4%    |
| -Os |      14%      |    18%   |    11%   |     4%    |
+-----+---------------+----------+----------+-----------+

      +---------------+----------+----------+-----------+
 R-pi | Authenticated |   Hash   | key      | Signature |
      | encryption    |          | Exchange |           |
+-----+---------------+----------+----------+-----------+
| -O3 |      22%      |    42%   |    11%   |     3%    |
| -O2 |      10%      |    27%   |    11%   |     4%    |
| -Os |      10%      |    22%   |     7%   |     3%    |
+-----+---------------+----------+----------+-----------+

TweetNaCl is much slower than Monocypher. This might seem strange, considering both Monocypher and TweetNaCl restrict themselves to portable C.

The main reason is, TweetNaCl sacrificed performance to shrink its source code. Its modular multiplication (Poly1305 and curve25519) is very slow, and unlike Monocypher doesn't use a precomputed comb.

Something more devious is going on as well: encryption and hashing aren't as slow as they look. With the maximum optimisation level, they are actually as fast as Monocypher's. Alas, encryption performance is offset by the slow Poly1305 authentication, and hashing performance looks bad because Monocypher cheats by using a faster algorithm.

Conclusion

Monocypher is closer in performance to Libsodium, and closer in size to TweetNaCl, even on a logarithmic scale. Not quite the best of both worlds, but still a nice sweet spot.

Raw data

Monocypher (core i5 Skylake, Ubuntu 16.04)

-O3 -march=native

Chacha20         :   390 megabytes  per second
Poly1305         :  1271 megabytes  per second
Auth'd encryption:   298 megabytes  per second
Blake2b          :   685 megabytes  per second
SHA-512          :   287 megabytes  per second
Argon2i, 3 passes:   484 megabytes  per second
x25519           :  7864 exchanges  per second
EdDSA(sign)      : 14277 signatures per second
EdDSA(check)     :  6189 checks     per second

-O2

Chacha20         :   361 megabytes  per second
Poly1305         :  1065 megabytes  per second
Auth'd encryption:   270 megabytes  per second
Blake2b          :   579 megabytes  per second
SHA-512          :   228 megabytes  per second
Argon2i, 3 passes:   354 megabytes  per second
x25519           :  7718 exchanges  per second
EdDSA(sign)      : 12258 signatures per second
EdDSA(check)     :  5888 checks     per second

-Os

Chacha20         :   307 megabytes  per second
Poly1305         :   944 megabytes  per second
Auth'd encryption:   231 megabytes  per second
Blake2b          :   457 megabytes  per second
SHA-512          :   228 megabytes  per second
Argon2i, 3 passes:   353 megabytes  per second
x25519           :  7586 exchanges  per second
EdDSA(sign)      : 11908 signatures per second
EdDSA(check)     :  5751 checks     per second

Libsodium 1.0.16 (core i5, Ubuntu 16.04)

--enable-opt, default flags

Chacha20         :  2129 megabytes  per second
Poly1305         :  2475 megabytes  per second
Auth'd encryption:  1147 megabytes  per second
Blake2b          :   782 megabytes  per second
SHA-512          :   347 megabytes  per second
Argon2i, 3 passes:   731 megabytes  per second
x25519           : 20618 exchanges  per second
EdDSA(sign)      : 36150 signatures per second
EdDSA(check)     : 13207 checks     per second

--disable-asm, default flags

Chacha20         :   403 megabytes  per second
Poly1305         :  1161 megabytes  per second
Auth'd encryption:   298 megabytes  per second
Blake2b          :   576 megabytes  per second
SHA-512          :   294 megabytes  per second
Argon2i, 3 passes:   358 megabytes  per second
x25519           : 15465 exchanges  per second
EdDSA(sign)      : 32750 signatures per second
EdDSA(check)     : 13211 checks     per second

--disable-asm, -O3 -march=native

Chacha20         :   393 megabytes  per second
Poly1305         :  1113 megabytes  per second
Auth'd encryption:   292 megabytes  per second
Blake2b          :   604 megabytes  per second
SHA-512          :   347 megabytes  per second
Argon2i, 3 passes:   433 megabytes  per second
x25519           : 15317 exchanges  per second
EdDSA(sign)      : 36949 signatures per second
EdDSA(check)     : 13910 checks     per second

--disable-asm, -O2

Chacha20         :   403 megabytes  per second
Poly1305         :  1161 megabytes  per second
Auth'd encryption:   300 megabytes  per second
Blake2b          :   569 megabytes  per second
SHA-512          :   294 megabytes  per second
Argon2i, 3 passes:   352 megabytes  per second
x25519           : 15486 exchanges  per second
EdDSA(sign)      : 32709 signatures per second
EdDSA(check)     : 13451 checks     per second

--disable-asm, -Os

Chacha20         :   333 megabytes  per second
Poly1305         :  1139 megabytes  per second
Auth'd encryption:   258 megabytes  per second
Blake2b          :   530 megabytes  per second
SHA-512          :   290 megabytes  per second
Argon2i, 3 passes:   243 megabytes  per second
x25519           : 15328 exchanges  per second
EdDSA(sign)      : 29652 signatures per second
EdDSA(check)     : 13166 checks     per second

TweetNaCl (core i5 Skylake, Ubuntu 16.04)

-O3 -march=native

Salsa20          :   232 megabytes  per second
Poly1305         :    82 megabytes  per second
Auth'd encryption:    60 megabytes  per second
SHA-512          :   213 megabytes  per second
x25519           :  1739 exchanges  per second
EdDSA(sign)      :   646 signatures per second
EdDSA(check)     :   323 checks     per second

-O2

Salsa20          :    59 megabytes  per second
Poly1305         :    40 megabytes  per second
Auth'd encryption:    24 megabytes  per second
SHA-512          :    86 megabytes  per second
x25519           :   857 exchanges  per second
EdDSA(sign)      :   505 signatures per second
EdDSA(check)     :   253 checks     per second

-Os

Salsa20          :    60 megabytes  per second
Poly1305         :    39 megabytes  per second
Auth'd encryption:    23 megabytes  per second
SHA-512          :    82 megabytes  per second
x25519           :   843 exchanges  per second
EdDSA(sign)      :   497 signatures per second
EdDSA(check)     :   249 checks     per second

Monocypher (Raspberry-Pi, model 3B )

(Note: EdDSA performance is still based on 2.0.0. Based on x86 speedups, 2.0.5 signatures should be 2.1 times as fast, and 2.0.5 verification should be 1.7 times as fast.)

-O3 march=native

Chacha20         :    63 megabytes  per second
Poly1305         :    67 megabytes  per second
Auth'd encryption:    32 megabytes  per second
Blake2b          :    26 megabytes  per second
SHA-512          :    13 megabytes  per second
Argon2i, 3 passes:    19 megabytes  per second
x25519           :   679 exchanges  per second
EdDSA(sign)      :  1311 signatures per second
EdDSA(check)     :   514 checks     per second

-O2

Chacha20         :    59 megabytes  per second
Poly1305         :    67 megabytes  per second
Auth'd encryption:    31 megabytes  per second
Blake2b          :    26 megabytes  per second
SHA-512          :    13 megabytes  per second
Argon2i, 3 passes:    20 megabytes  per second
x25519           :   656 exchanges  per second
EdDSA(sign)      :  1138 signatures per second
EdDSA(check)     :   501 checks     per second

-Os

Chacha20         :    57 megabytes  per second
Poly1305         :    69 megabytes  per second
Auth'd encryption:    31 megabytes  per second
Blake2b          :    32 megabytes  per second
SHA-512          :    14 megabytes  per second
Argon2i, 3 passes:     0 megabytes  per second
x25519           :   776 exchanges  per second
EdDSA(sign)      :  1172 signatures per second
EdDSA(check)     :   594 checks     per second

Libsodium (Raspberry-Pi, model 3B )

--enable-opt

Chacha20         :    72 megabytes  per second
Poly1305         :   166 megabytes  per second
Auth'd encryption:    50 megabytes  per second
Blake2b          :    26 megabytes  per second
SHA-512          :    11 megabytes  per second
Argon2i, 3 passes:    19 megabytes  per second
x25519           :   686 exchanges  per second
EdDSA(sign)      :  1702 signatures per second
EdDSA(check)     :   618 checks     per second

--disable-asm

Chacha20         :    73 megabytes  per second
Poly1305         :   166 megabytes  per second
Auth'd encryption:    50 megabytes  per second
Blake2b          :    26 megabytes  per second
SHA-512          :    11 megabytes  per second
Argon2i, 3 passes:    19 megabytes  per second
x25519           :   677 exchanges  per second
EdDSA(sign)      :  1696 signatures per second
EdDSA(check)     :   601 checks     per second

TweetNaCl (Raspberry-Pi, model 3B )

-O3 march=native

Salsa20          :    64 megabytes  per second
Poly1305         :     9 megabytes  per second
Auth'd encryption:     7 megabytes  per second
SHA-512          :    11 megabytes  per second
x25519           :    78 exchanges  per second
EdDSA(sign)      :    44 signatures per second
EdDSA(check)     :    22 checks     per second

-O2

Salsa20          :     8 megabytes  per second
Poly1305         :     4 megabytes  per second
Auth'd encryption:     3 megabytes  per second
SHA-512          :     7 megabytes  per second
x25519           :    72 exchanges  per second
EdDSA(sign)      :    42 signatures per second
EdDSA(check)     :    21 checks     per second

-Os

Salsa20          :     8 megabytes  per second
Poly1305         :     4 megabytes  per second
Auth'd encryption:     3 megabytes  per second
SHA-512          :     7 megabytes  per second
x25519           :    58 exchanges  per second
EdDSA(sign)      :    34 signatures per second
EdDSA(check)     :    17 checks     per second