# Speed Benchmarks

For those who care about speed, Monocypher comes with a couple benchmarks. Run them on your platform if you're not sure this is fast enough. There are also benchmarks for Libsodium and TweetNaCl.

This page reports the results on my core i5 Skylake laptop (x86-64), and on a Raspberry pi model 3B (ARM core of an BCM2837). Everything is single threaded, and compiled with GCC. The versions measured here are Monocypher 2.0.5 and Libsodium 1.0.16.

To avoid a false sense of accuracy, most reported numbers are rounded to the nearest two significant digits. Absolute numbers are expressed in megabytes per second, exchanges per second, or signatures per second.

```
+---------------+----------+----------+----------+-----------+
x86 | Authenticated | Hash | Password | key | Signature |
64 | encryption | | hash | Exchange | |
+-----+---------------+----------+----------+----------+-----------+
| def | ... MB/s | ... MB/s | ... MB/s | ...../s | ...../s |
| ASM | ..% | ..% | ..% | ..% | ..% |
| -O3 | ..% | ..% | ..% | ..% | ..% |
| -O2 | ..% | ..% | ..% | ..% | ..% |
| -Os | ..% | ..% | ..% | ..% | ..% |
+-----+---------------+----------+----------+----------+-----------+
```

The various lines mean slightly different compilation options for each library (the first two lines only make sense when Libsodium is shown).

```
+------------------+--------------------+------------------+
| Monocypher | Libsodium | TweetNaCl |
+-----+------------------+--------------------+------------------+
| ASM | -O3 march=native | --enable-opt | -O3 march=native |
| def | -O3 march=native | --disable-asm | -O3 march=native |
| -O3 | -O3 march=native | --disable-asm, | -O3 march=native |
| | | -O3 march=native | |
| -O2 | -O2 | --disable-asm, -O2 | -O2 |
| -Os | -Os | --disable-asm, -Os | -Os |
+-----+------------------+--------------------+------------------+
```

The `--enable-opt`

and `--disable-asm`

options are set during
Libsodium's `configure`

step. The other options are set by overriding
the `CFLAGS`

variable during the `make`

step. Note that the
`--disable-asm`

doesn't disable 128-bit arithmetic.

## Effect of compilation options

Not everyone can afford, or even trust, `-O3 -march=native`

. Here we
measure the effect of compilation options. The baseline is ```
-O3
-march=native
```

. The other are compared to it. Libsodium has another
line, ASM, where it takes advantage of non-portable implementations
(typically compiler intrinsics).

### Monocypher

```
+---------------+----------+----------+----------+-----------+
x86 | Authenticated | Hash | Password | key | Signature |
64 | encryption | | hash | Exchange | |
+-----+---------------+----------+----------+----------+-----------+
| -O3 | 300 MB/s | 690 MB/s | 480 MB/s | 7900/s | 14300/s |
| -O2 | 91% | 85% | 73% | 98% | 86% |
| -Os | 78% | 66% | 73% | 96% | 83% |
+-----+---------------+----------+----------+----------+-----------+
+---------------+----------+----------+----------+-----------+
R-pi | Authenticated | Hash | Password | key | Signature |
| encryption | | hash | Exchange | |
+-----+---------------+----------+----------+----------+-----------+
| -O3 | 32 MB/s | 26 MB/s | 19 MB/s | 680/s | 1310/s |
| -O2 | 97% | 100% | 105% | 96% | 87% |
| -Os | 97% | 120% | <1 MB/s | 114% | 89% |
+-----+---------------+----------+----------+----------+-----------+
```

Seems that `-O3`

sometimes make things worse. Also note the abysmal
performance of Argon2i with `-Os`

on the R-pi—I have no idea what causes
this.

### Libsodium

```
+---------------+----------+----------+----------+-----------+
x86 | Authenticated | Hash | Password | key | Signature |
64 | encryption | | hash | Exchange | |
+-----+---------------+----------+----------+----------+-----------+
| def | 300 MB/s | 580 MB/s | 360 MB/s | 15000/s | 33000/s |
| ASM | 380% | 136% | 200% | 133% | 110% |
| -O3 | 98% | 104% | 121% | 99% | 113% |
| -O2 | 100% | 99% | 98% | 100% | 100% |
| -Os | 86% | 92% | 68% | 99% | 91% |
+-----+---------------+----------+----------+----------+-----------+
```

Don't trust signature and key exchange timings too much, they are less accurate than the other timings because of how freaking fast they are.

Non-portable implementations are *much* faster on x86-64. (Signature
and key exchange use 128-bit arithmetic for all options here, so the
differences are very small there.)

```
+---------------+----------+----------+----------+-----------+
R-pi | Authenticated | Hash | Password | key | Signature |
| encryption | | hash | Exchange | |
+-----+---------------+----------+----------+----------+-----------+
| def | 50 MB/s | 26 MB/s | 19 MB/s | 680/s | 1700/s |
| ASM | 100% | 100% | 100% | 99% | 100% |
+-----+---------------+----------+----------+----------+-----------+
```

I did not have the courage to test all build options for the R-pi. Compilation was too damn slow, and I didn't feel like setting up my first cross compilation tool-chain.

Libsodium doesn't seem to have dedicated optimisations for the R-pi.

### TweetNaCl

Note that TweetNaCl uses Salsa20 and SHA-512. Monocypher and Libsodium use Chacha20 and Blake2b instead. While Salsa20 and Chacha20 are mostly comparable, Blake2b is much faster than SHA-512. This puts TweetNaCl at a disadvantage.

Also, TweetNaCl doesn't implement password hashing.

```
+---------------+----------+----------+-----------+
x86 | Authenticated | Hash | key | Signature |
64 | encryption | | Exchange | |
+-----+---------------+----------+----------+-----------+
| -O3 | 60 MB/s | 213 MB/s | 1739/s | 646/s |
| -O2 | 40% | 40% | 49% | 78% |
| -Os | 38% | 38% | 48% | 77% |
+-----+---------------+----------+----------+-----------+
+---------------+----------+----------+-----------+
R-pi | Authenticated | Hash | key | Signature |
| encryption | | Exchange | |
+-----+---------------+----------+----------+-----------+
| -O3 | 7 MB/s | 11 MB/s | 78/s | 44/s |
| -O2 | 40% | 64% | 92% | 95% |
| -Os | 40% | 64% | 74% | 77% |
+-----+---------------+----------+----------+-----------+
```

TweetNaCl is pretty slow, and very sensitive to compilation options. Especially Salsa20, though this is somewhat masked by Poly1305, which is slow everywhere.

Still, I wouldn't dismiss TweetNaCl out of hand. It *is* fast enough
for most casual purposes.

## Speed comparisons

The main course. We use Monocypher's speed as the baseline, at 100%.

### Monocypher vs Libsodium

```
+---------------+----------+----------+----------+-----------+
x86 | Authenticated | Hash | Password | key | Signature |
64 | encryption | | hash | Exchange | |
+-----+---------------+----------+----------+----------+-----------+
| ASM | 380% | 114% | 151% | 260% | 250% |
| def | 100% | 84% | 74% | 200% | 230% |
| -O3 | 98% | 88% | 89% | 220% | 260% |
| -O2 | 111% | 98% | 99% | 200% | 270% |
| -Os | 112% | 116% | 69% | 200% | 250% |
+-----+---------------+----------+----------+----------+-----------+
```

Libsodium is consistently fast on key exchanges and signatures because of 128-bit arithmetic, and bigger precomputed tables for signatures. Monocypher doesn't use 128-bit arithmetic to stay portable, and it avoids big precomputed tables to save code.

More interesting is the performance of Blake2b and Argon2i. Monocypher
actually *beats* Libsodium's reference implementations. For Blake2b,
this is because Monocypher forcibly unrolls the inner loop, which
enables better constant propagation. This costs about 4Kb of generated
code. For Argon2i, I'm not sure. I suspect the reference
implementation performs extraneous copies and allocations.

Of course, the non-portable implementations are still a bit faster.

```
+---------------+----------+----------+----------+-----------+
R-pi | Authenticated | Hash | Password | key | Signature |
| encryption | | hash | Exchange | |
+-----+---------------+----------+----------+----------+-----------+
| ASM | 156% | 100% | 19% | 101% | 130% |
| def | 156% | 100% | 19% | 100% | 129% |
+-----+---------------+----------+----------+----------+-----------+
```

No more 128-bit arithmetic on the R-pi, so key exchange performs the same. Signatures still benefit from their bigger precomputed tables.

Monocypher lost its edge for Blake2b. I suspect the unrolled inner loop strains the instruction cache of the R-pi's smaller processor.

Libsodium's Poly1305 is much faster than Monocypher's. I have no idea why, I haven't seen any 32-bit specific implementation. This gives it a significant edge for authenticated encryption, though not nearly as impressive as x86-64.

### Monocypher vs TweetNaCl

```
+---------------+----------+----------+-----------+
x86 | Authenticated | Hash | key | Signature |
64 | encryption | | Exchange | |
+-----+---------------+----------+----------+-----------+
| -O3 | 20% | 31% | 22% | 5% |
| -O2 | 9% | 14% | 11% | 4% |
| -Os | 14% | 18% | 11% | 4% |
+-----+---------------+----------+----------+-----------+
+---------------+----------+----------+-----------+
R-pi | Authenticated | Hash | key | Signature |
| encryption | | Exchange | |
+-----+---------------+----------+----------+-----------+
| -O3 | 22% | 42% | 11% | 3% |
| -O2 | 10% | 27% | 11% | 4% |
| -Os | 10% | 22% | 7% | 3% |
+-----+---------------+----------+----------+-----------+
```

TweetNaCl is much slower than Monocypher. This might seem strange, considering both Monocypher and TweetNaCl restrict themselves to portable C.

The main reason is, TweetNaCl sacrificed performance to shrink its
source code. Its modular multiplication (Poly1305 and curve25519) is
*very* slow, and unlike Monocypher doesn't use a precomputed comb.

Something more devious is going on as well: encryption and hashing aren't as slow as they look. With the maximum optimisation level, they are actually as fast as Monocypher's. Alas, encryption performance is offset by the slow Poly1305 authentication, and hashing performance looks bad because Monocypher cheats by using a faster algorithm.

## Conclusion

Monocypher is closer in performance to Libsodium, and closer in size to TweetNaCl, even on a logarithmic scale. Not quite the best of both worlds, but still a nice sweet spot.

## Raw data

### Monocypher (core i5 Skylake, Ubuntu 16.04)

-O3 -march=native

```
Chacha20 : 390 megabytes per second
Poly1305 : 1271 megabytes per second
Auth'd encryption: 298 megabytes per second
Blake2b : 685 megabytes per second
SHA-512 : 287 megabytes per second
Argon2i, 3 passes: 484 megabytes per second
x25519 : 7864 exchanges per second
EdDSA(sign) : 14277 signatures per second
EdDSA(check) : 6189 checks per second
```

-O2

```
Chacha20 : 361 megabytes per second
Poly1305 : 1065 megabytes per second
Auth'd encryption: 270 megabytes per second
Blake2b : 579 megabytes per second
SHA-512 : 228 megabytes per second
Argon2i, 3 passes: 354 megabytes per second
x25519 : 7718 exchanges per second
EdDSA(sign) : 12258 signatures per second
EdDSA(check) : 5888 checks per second
```

-Os

```
Chacha20 : 307 megabytes per second
Poly1305 : 944 megabytes per second
Auth'd encryption: 231 megabytes per second
Blake2b : 457 megabytes per second
SHA-512 : 228 megabytes per second
Argon2i, 3 passes: 353 megabytes per second
x25519 : 7586 exchanges per second
EdDSA(sign) : 11908 signatures per second
EdDSA(check) : 5751 checks per second
```

### Libsodium 1.0.16 (core i5, Ubuntu 16.04)

--enable-opt, default flags

```
Chacha20 : 2129 megabytes per second
Poly1305 : 2475 megabytes per second
Auth'd encryption: 1147 megabytes per second
Blake2b : 782 megabytes per second
SHA-512 : 347 megabytes per second
Argon2i, 3 passes: 731 megabytes per second
x25519 : 20618 exchanges per second
EdDSA(sign) : 36150 signatures per second
EdDSA(check) : 13207 checks per second
```

--disable-asm, default flags

```
Chacha20 : 403 megabytes per second
Poly1305 : 1161 megabytes per second
Auth'd encryption: 298 megabytes per second
Blake2b : 576 megabytes per second
SHA-512 : 294 megabytes per second
Argon2i, 3 passes: 358 megabytes per second
x25519 : 15465 exchanges per second
EdDSA(sign) : 32750 signatures per second
EdDSA(check) : 13211 checks per second
```

--disable-asm, -O3 -march=native

```
Chacha20 : 393 megabytes per second
Poly1305 : 1113 megabytes per second
Auth'd encryption: 292 megabytes per second
Blake2b : 604 megabytes per second
SHA-512 : 347 megabytes per second
Argon2i, 3 passes: 433 megabytes per second
x25519 : 15317 exchanges per second
EdDSA(sign) : 36949 signatures per second
EdDSA(check) : 13910 checks per second
```

--disable-asm, -O2

```
Chacha20 : 403 megabytes per second
Poly1305 : 1161 megabytes per second
Auth'd encryption: 300 megabytes per second
Blake2b : 569 megabytes per second
SHA-512 : 294 megabytes per second
Argon2i, 3 passes: 352 megabytes per second
x25519 : 15486 exchanges per second
EdDSA(sign) : 32709 signatures per second
EdDSA(check) : 13451 checks per second
```

--disable-asm, -Os

```
Chacha20 : 333 megabytes per second
Poly1305 : 1139 megabytes per second
Auth'd encryption: 258 megabytes per second
Blake2b : 530 megabytes per second
SHA-512 : 290 megabytes per second
Argon2i, 3 passes: 243 megabytes per second
x25519 : 15328 exchanges per second
EdDSA(sign) : 29652 signatures per second
EdDSA(check) : 13166 checks per second
```

### TweetNaCl (core i5 Skylake, Ubuntu 16.04)

-O3 -march=native

```
Salsa20 : 232 megabytes per second
Poly1305 : 82 megabytes per second
Auth'd encryption: 60 megabytes per second
SHA-512 : 213 megabytes per second
x25519 : 1739 exchanges per second
EdDSA(sign) : 646 signatures per second
EdDSA(check) : 323 checks per second
```

-O2

```
Salsa20 : 59 megabytes per second
Poly1305 : 40 megabytes per second
Auth'd encryption: 24 megabytes per second
SHA-512 : 86 megabytes per second
x25519 : 857 exchanges per second
EdDSA(sign) : 505 signatures per second
EdDSA(check) : 253 checks per second
```

-Os

```
Salsa20 : 60 megabytes per second
Poly1305 : 39 megabytes per second
Auth'd encryption: 23 megabytes per second
SHA-512 : 82 megabytes per second
x25519 : 843 exchanges per second
EdDSA(sign) : 497 signatures per second
EdDSA(check) : 249 checks per second
```

### Monocypher (Raspberry-Pi, model 3B )

*(Note: EdDSA performance is still based on 2.0.0. Based on x86
speedups, 2.0.5 signatures should be 2.1 times as fast, and 2.0.5
verification should be 1.7 times as fast.)*

-O3 march=native

```
Chacha20 : 63 megabytes per second
Poly1305 : 67 megabytes per second
Auth'd encryption: 32 megabytes per second
Blake2b : 26 megabytes per second
SHA-512 : 13 megabytes per second
Argon2i, 3 passes: 19 megabytes per second
x25519 : 679 exchanges per second
EdDSA(sign) : 1311 signatures per second
EdDSA(check) : 514 checks per second
```

-O2

```
Chacha20 : 59 megabytes per second
Poly1305 : 67 megabytes per second
Auth'd encryption: 31 megabytes per second
Blake2b : 26 megabytes per second
SHA-512 : 13 megabytes per second
Argon2i, 3 passes: 20 megabytes per second
x25519 : 656 exchanges per second
EdDSA(sign) : 1138 signatures per second
EdDSA(check) : 501 checks per second
```

-Os

```
Chacha20 : 57 megabytes per second
Poly1305 : 69 megabytes per second
Auth'd encryption: 31 megabytes per second
Blake2b : 32 megabytes per second
SHA-512 : 14 megabytes per second
Argon2i, 3 passes: 0 megabytes per second
x25519 : 776 exchanges per second
EdDSA(sign) : 1172 signatures per second
EdDSA(check) : 594 checks per second
```

### Libsodium (Raspberry-Pi, model 3B )

--enable-opt

```
Chacha20 : 72 megabytes per second
Poly1305 : 166 megabytes per second
Auth'd encryption: 50 megabytes per second
Blake2b : 26 megabytes per second
SHA-512 : 11 megabytes per second
Argon2i, 3 passes: 19 megabytes per second
x25519 : 686 exchanges per second
EdDSA(sign) : 1702 signatures per second
EdDSA(check) : 618 checks per second
```

--disable-asm

```
Chacha20 : 73 megabytes per second
Poly1305 : 166 megabytes per second
Auth'd encryption: 50 megabytes per second
Blake2b : 26 megabytes per second
SHA-512 : 11 megabytes per second
Argon2i, 3 passes: 19 megabytes per second
x25519 : 677 exchanges per second
EdDSA(sign) : 1696 signatures per second
EdDSA(check) : 601 checks per second
```

### TweetNaCl (Raspberry-Pi, model 3B )

-O3 march=native

```
Salsa20 : 64 megabytes per second
Poly1305 : 9 megabytes per second
Auth'd encryption: 7 megabytes per second
SHA-512 : 11 megabytes per second
x25519 : 78 exchanges per second
EdDSA(sign) : 44 signatures per second
EdDSA(check) : 22 checks per second
```

-O2

```
Salsa20 : 8 megabytes per second
Poly1305 : 4 megabytes per second
Auth'd encryption: 3 megabytes per second
SHA-512 : 7 megabytes per second
x25519 : 72 exchanges per second
EdDSA(sign) : 42 signatures per second
EdDSA(check) : 21 checks per second
```

-Os

```
Salsa20 : 8 megabytes per second
Poly1305 : 4 megabytes per second
Auth'd encryption: 3 megabytes per second
SHA-512 : 7 megabytes per second
x25519 : 58 exchanges per second
EdDSA(sign) : 34 signatures per second
EdDSA(check) : 17 checks per second
```