Rust Compilation Speed
November 17, 2021
Some notes on rust compilation options and the effect on compile times of a moderate sized rust application (5000 lines). This looks at tweaking options in .cargo/config aswell as the Cargo.toml file.
An example .cargo/config file:
[build]
rustflags = ["-C", "target-cpu=native"]
rustc-wrapper = "sccache"
[target.x86_64-pc-windows-msvc]
linker = "rust-lld.exe"
[target.x86_64-unknown-linux-gnu]
linker = "/usr/bin/clang"
rustflags = ["-Clink-arg=-fuse-ld=lld"]
[profile.dev.package."*"]
opt-level = 3These tests run in the following environment:
- Rust v1.55, CPU 12 core Ryzen 5900x, RAM 32GB, NVM SSD
- Using the faster
lldlinker unless specified - Using the
target-cpu=nativeoption - Dependencies are already downloaded, so only looking at compiling/linking time
- Debug
opt-levelsonly apply to third party code to help compile iteration times
Full Rebuild Performance
| Options | Time |
|---|---|
| Debug + sccache second run, opt-level 0, debug = 0 | 0m.25s |
| Debug + sccache second run, opt-level 0, debug = true | 0m.31s |
| Debug, opt-level 0, debug = 0 | 0m.34s |
| Debug, opt-level 0, debug = 1 | 0m.36s |
| Debug, (default) opt-level 0, debug = true | 0m.39s |
| Debug + sccache first run, opt-level 0, debug = 0 | 0m.49s |
| Debug + sccache second run, opt-level 3, debug = 0 | 0m.52s |
| Debug + sccache first run, opt-level 0, debug = true | 0m.59s |
| Debug, opt-level 1, debug = 0 | 0m.59s |
| Debug, opt-level 1, debug = 1 | 1m.03s |
| Debug, opt-level 1, debug = true | 1m.11s |
| Debug + default linker + sccache second run, opt-level 3, debug = 0 | 1m.12s |
| Debug, opt-level 3, debug = 0 | 1m.15s |
| Debug, opt-level 3, debug = 1 | 1m.17s |
| Debug, opt-level 3, debug = true | 1m.31s |
| Debug + sccache first run, opt-level 3, debug = 0 | 1m.51s |
| Release + sccache second run, codegen-units = 16, lto = thin | 0m.50s |
| Release, (default) codegen-units = 16, lto = false | 0m.59s |
| Release, codegen-units = 16, lto = thin | 0m.59s |
| Release, codegen-units = 1, lto = false | 1m.18s |
| Release + sccache first run, codegen-units = 16, lto = thin | 1m.28s |
| Release, codegen-units = 1, lto = thin | 1m.32s |
| Release + sccache second run, codegen-units = 16, lto = true | 1m.47s |
| Release + sccache second run, codegen-units = 1, lto = thin | 2m.13s |
| Release + sccache first run, codegen-units = 16, lto = true | 2m.15s |
| Release, codegen-units = 1, lto = true | 2m.02s |
| Release + sccache second run, codegen-units = 1, lto = true | 2m.12s |
| Release + sccache first run, codegen-units = 1, lto = thin | 2m.13s |
| Release + sccache first run, codegen-units = 1, lto = true | 2m.57s |
- Debug can full rebuild slower than Release mode,
incrementalcompilation is off by default for Release, but it’s still surprising debuglevels of0and1(line number tables only) are similar, default full debugtrueis clearly slower (but actually useful in a debugger)- The
lto(link time optimization) option has more performance per compile time than decreasingcodegen-unitsunits (lower parallelism creates faster code/slower compile due to more optimisation context given to that codegen unit). - The
lto = thinsetting can be almost as fast as the less optimized nolto - With
ltoset totrueorfullit globally optimizes every crate in the binary, so can be heavy on RAM usage and doesn’t scale well with project size - Using sccache may pay off for repeated full rebuilds, especially if you have a fast storage device to access the 10GB shared compilation cache. It works across projects and local build cleans. It’s less important for incremental compilation / iteration.
sccacheis not that fast with the default linker on Windows - use the fasterlldlinker.- Looking at sccache stats
sccache -sshows that there are some cache misses whenincrementalcompilation is used (default in debug) and depending on the crate type, it’s just not cacheable sccachedid not help release build with lowcodegen-units, so unless it’s your final optimized release build prefer the defaultcodegen-units
Incremental Build Performance
Running cargo build after editing a string in the source code. If you just need to know if the code is correct you can run cargo check, which is much quicker, but your editor/IDE is probably already doing that.
| Options | |
|---|---|
| opt-level 3, debug = 0 | 2.45s |
| sccache + opt-level 3, debug = 0 | 2.5s |
| sccache + opt-level 1, debug = 1, target-cpu=generic | 2.75s |
| opt-level 3, debug = 0 | 2.78s |
| sccache + opt-level 3, debug = 1 | 2.8s |
| sccache + opt-level 1, debug = 1 | 2.8s |
| default linker + sccache, opt-level 3, debug = 1 | 3.00s |
| default linker, opt-level 3, debug = 1 | 3.02s |
| default linker + sccache, opt-level 1, debug = 1 | 3.02s |
| default linker + sccache, opt-level 1, debug = 1 | 3.04s |
| sccache + opt-level 3, debug = true | 3.5s |
Apart from ensuring we are using the lld linker the main speed up comes from reducing the debug level. If you can get
by with line numbers only for symbols then debug = 1 instead of the default debug = true is a good idea.
Debug Code Performance
As mentioned, debug opt-levels are only applied to third party code to help compile iteration times. Opt level 0 is
the default level for the dev profile with up to level 3 being full optimizations. Level 2 will unroll loops,
potentially making debugging confusing. Level 3 does more vectorization and inlining. When using a debugger consider going to opt-level 0 or 1.