Tips for Faster Rust Compile Times
When it comes to runtime performance, Rust is one of the fastest guns in the west. It is on par with the likes of C and C++ and sometimes even surpasses them. Compile times, however? That's a different story.
Why Is Rust Compilation Slow?
Wait a sec, slow in comparison to what? For example, if you compare it with Go, their compiler is doing a lot less work in general. It lacks support for generics and macros. Also, the Go compiler was built from scratch as a monolithic tool consisting of both, the frontend and the backend (rather than relying on, say, LLVM to take over the backend part, which is the case for Rust or Swift). This has advantages (more options for tweaking the entire process, yay) and disadvantages (higher maintenance costs and less supported architectures).
Comparing across toolchains makes little sense here, and compile times are mostly fine for smaller projects, so if your project builds fast enough, your job here is done.
Choosing Runtime Over Compile-Time Performance
As is often cautioned in debates among their designers, programming language design is full of tradeoffs. One of those fundamental tradeoffs is runtime performance vs. compile-time performance, and the Rust team nearly always (if not always) chose runtime over compile-time.
— Brian Anderson
Overall, there are a few features and design decisions that limit Rust compilation speed:
- Macros: Code generation with macros can be quite expensive.
- Type checking
- Monomorphization: this is the process of generating specialized versions
of generic functions. E.g., a function that takes an
Into<String>gets converted into one that takes a
Stringand one that takes a
- LLVM: that's the default compiler backend for Rust, where a lot of the heavy-lifting (like code-optimizations) takes place. LLVM is notorious for being slow.
- Linking: Strictly speaking, this is not part of compiling but happens
right after. It "connects" your Rust binary with the system libraries.
cargodoes not explicitly mark the linking step, so many people add it to the overall compilation time.
If you're interested in all the gory details, check out this blog post by Brian Anderson.
Making the Rust compiler faster is an ongoing process, and many fearless people are working on it. Thanks to their hard work, compiler speed has improved 30-40% across the board year-to-date, with some projects seeing up to 45%+ improvements. On top of that, Rust tracks compile regressions on a website dedicated to performance
Work is also put into optimizing the LLVM backend. Rumor has it that there's still a lot of low-hanging fruit. 🍇
Overall, the Rust compiler is legitimately doing a great job. That said, above a certain project size, the compile times are... let's just say they could be better.
According to the Rust 2019 survey, improving compile times is #4 on the Rust wishlist:
But all hope is not lost! Below is a list of tips and tricks on how to make your Rust project compile faster today. They are roughly ordered by practicality, so start at the top and work your way down until you're happy.
cargo check Instead Of
Most of the time, you don't even have to compile your project at all; you just want to know if you messed up somewhere. Whenever you can, skip compilation altogether. What you want instead is laser-fast code linting, type- and borrow-checking.
For that, cargo has a special treat for you: ✨
cargo check ✨. Consider the
differences in the number of instructions between
cargo check on the left and
cargo debug in the middle. (Note that the scales are different.)
A sweet trick I use is to run it in the background with
cargo watch. This way, it will
whenever you change a file.
⭐ Pro-tip: Use
cargo watch -c to clear the screen before every run.
Use Rust Analyzer Instead Of Rust Language Server
Another quick way to check if you set the codebase on fire is to use a "language server". That's basically a "linter as a service", that runs next to your editor.
For a long time, the default choice here was rls, but lately, folks moved over to rust-analyzer, because it's more feature-complete and way more snappy. It supports all major IDEs. Switching to that alone might save your day.
Remove Unused Dependencies
So let's say you tried all of the above and find that compilation is still slow. What now?
Dependencies sometimes become obsolete thanks to refactoring. From time to time it helps to check if all of them are still needed to save compile time.
If this is your own project (or a project you like to contribute to), do a quick check if you can toss anything with cargo-udeps:
cargo install cargo-udeps && cargo +nightly udeps
Update Remaining Dependencies
Next, update your dependencies, because they themselves could have tidied up their dependency tree lately.
Take a deep dive with
(built right into cargo itself) to find any outdated dependencies. On top of
cargo audit to get
notified about any vulnerabilities which need to be addressed, or deprecated
crates which need a replacement.
Here's a nice workflow that I learned from /u/oherrala on Reddit:
cargo updateto update to the latest semver compatible version.
cargo outdated -wRto find newer, possibly incompatible dependencies. Update those and fix code as needed.
- Find duplicate versions of a dependency and figure out
where they come from:
cargo tree --duplicateshows dependencies which come in multiple versions.
(Thanks to /u/dbdr for pointing this out.)
⭐ Pro-tip: Step 3 is a great way to contribute back to the community! Clone the repository and execute steps 1 and 2. Finally, send a pull request to the maintainers.
Replace Heavy Dependencies
From time to time, it helps to shop around for more lightweight alternatives to popular crates.
cargo tree is your friend here to help you understand which of your
dependencies are quite heavy: they require many other crates, causing
excessive network I/O and slow down your build. Then search for lighter
Here are a few examples:
- Using serde? Check out miniserde and maybe even nanoserde.
- reqwests is quite heavy. Maybe try attohttpc or ureq, which are more lightweight.
tokio dragging you down? How about smol?
(Edit: This won't help much with build times. More info in this discussion on Reddit)
- Swap out clap with pico-args if you only need basic argument parsing.
Here's an example where switching crates reduced compile times from 2:22min to 26 seconds.
Use Cargo Workspaces
Cargo has that neat feature called workspaces, which allow you to split one big crate into multiple smaller ones. This code-splitting is great for avoiding repetitive compilation because only crates with changes have to be recompiled. Bigger projects like servo and vector are using workspaces heavily to slim down compile times. Learn more about workspaces here.
Combine All Integration Tests In A Single Binary
Have any integration tests? (These are the ones in your
The Rust compiler will create a binary for every single one of them.
This can take most of your build time because linking is slooow. 🐢
The reason is that many system linkers (like
ld) are single
To make the linker's job a little easier, you can put all your tests in one
crate. (Basically create a
main.rs in your test folder and add your
test files as
mod in there.)
Then the linker will go ahead and build a single binary only. Sounds nice, but
careful: it's still a trade-off as you'll need to expose your internal types and
functions (i.e. make them
Might be worth a try, though because a recent benchmark revealed a 1.9x speedup for one project.
Disable Unused Features Of Crate Dependencies
⚠️ Fair warning: it seems that switching off features doesn't always improve compile time. (See tikv's experiences here.)
Check the feature flags of your dependencies. A lot of library maintainers take the effort to split their crate into separate features that can be toggled off on demand. Maybe you don't need all the default functionality from every crate?
tokio has a ton of
that you can disable if needed.
A quick way to list the features of a crate is cargo-feature-set.
Admittedly, features are not very discoverable at the moment because there is no standard way to document them, but we'll get there eventually.
Use A Ramdisk For Compilation
When starting to compile heavy projects, I noticed that I was throttled on I/O. The reason was that I kept my projects on a measly HDD. A more performant alternative would be SSDs, but if that's not an option, don't throw in the sponge just yet.
Ramdisks to the rescue! These are like "virtual harddisks" that live in system memory.
mkdir -p target && \ sudo mount -t tmpfs none ./target && \ cat /proc/mounts | grep "$(pwd)" | sudo tee -a /etc/fstab
On macOS, you could probably do something similar with this script. I haven't tried that myself, though.
Cache Dependencies With sccache
Another neat project is sccache by Mozilla, which caches compiled crates to avoid repeated compilation.
I had this running on my laptop for a while, but the benefit was rather negligible, to be honest. It works best if you work on a lot of independent projects that share dependencies (in the same version). A common use-case is shared build servers.
Cranelift – The Alternative Rust Compiler
Lately, I was excited to hear that the Rust project is using an alternative
compiler that runs in parallel with
rustc for every CI build:
Cranelift, also called
Here is a comparison between
rustc and Cranelift for some popular crates (blue
Somewhat unbelieving, I tried to compile vector with both compilers.
The results were astonishing:
- Rustc: 5m 45s
- Cranelift: 3m 13s
I could really feel the difference! What's cool about this is that it creates fully working executable binaries. They won't be optimized as much, but they are great for testing.
Switch To A Faster Linker
The thing that nobody seems to target is linking time. For me, when using something with a big dependency tree like Amethyst, for example linking time on my fairly recent Ryzen 7 1700 is ~10s each time, even if I change only some minute detail only in my code. — /u/Almindor on Reddit
According to the official documentation, "LLD is a linker from the LLVM project that is a drop-in replacement for system linkers and runs much faster than them. [..] When you link a large program on a multicore machine, you can expect that LLD runs more than twice as fast as the GNU gold linker. Your mileage may vary, though."
If you're on Linux you can switch to
[target.x86_64-unknown-linux-gnu] rustflags = [ "-C", "link-arg=-fuse-ld=lld", ]
A word of caution:
lld might not be working on all platforms yet. At least on
macOS, Rust support seems to be broken at the moment, and the work on fixing it
has stalled (see
Tweak Compiler Flags
Rust comes with a huge set of compiler flags. For special cases, it can help to tweak them for your project.
Profile Compile Times
If you like to dig deeper, Rust compilation can be profiled with
cargo rustc -- -Zself-profile.
The resulting trace file can be visualized with a flamegraph or the Chromium
There's also a
cargo -Z timings
feature that gives some information about how long each compilation step takes,
and tracks concurrency information over time.
Another golden one is
cargo-llvm-lines, which shows
the number of lines generated and objects copied in the LLVM backend:
$ cargo llvm-lines | head -20 Lines Copies Function name ----- ------ ------------- 30737 (100%) 1107 (100%) (TOTAL) 1395 (4.5%) 83 (7.5%) core::ptr::drop_in_place 760 (2.5%) 2 (0.2%) alloc::slice::merge_sort 734 (2.4%) 2 (0.2%) alloc::raw_vec::RawVec<T,A>::reserve_internal 666 (2.2%) 1 (0.1%) cargo_llvm_lines::count_lines 490 (1.6%) 1 (0.1%) <std::process::Command as cargo_llvm_lines::PipeTo>::pipe_to 476 (1.5%) 6 (0.5%) core::result::Result<T,E>::map 440 (1.4%) 1 (0.1%) cargo_llvm_lines::read_llvm_ir 422 (1.4%) 2 (0.2%) alloc::slice::merge 399 (1.3%) 4 (0.4%) alloc::vec::Vec<T>::extend_desugared 388 (1.3%) 2 (0.2%) alloc::slice::insert_head 366 (1.2%) 5 (0.5%) core::option::Option<T>::map 304 (1.0%) 6 (0.5%) alloc::alloc::box_free 296 (1.0%) 4 (0.4%) core::result::Result<T,E>::map_err 295 (1.0%) 1 (0.1%) cargo_llvm_lines::wrap_args 291 (0.9%) 1 (0.1%) core::char::methods::<impl char>::encode_utf8 286 (0.9%) 1 (0.1%) cargo_llvm_lines::run_cargo_rustc 284 (0.9%) 4 (0.4%) core::option::Option<T>::ok_or_else
Avoid Procedural Macro Crates
Procedural macros are the hot sauce of Rust development: they burn through CPU cycles so use with care (keyword: monomorphization).
If you heavily use procedural macros in your project (e.g., if you use serde), you can try to sidestep their impact on compile times with watt, a tool that offloads macro compilation to Webassembly.
From the docs:
By compiling macros ahead-of-time to Wasm, we save all downstream users of the macro from having to compile the macro logic or its dependencies themselves.
Instead, what they compile is a small self-contained Wasm runtime (~3 seconds, shared by all macros) and a tiny proc macro shim for each macro crate to hand off Wasm bytecode into the Watt runtime (~0.3 seconds per proc-macro crate you depend on). This is much less than the 20+ seconds it can take to compile complex procedural macros and their dependencies.
Note that this crate is still experimental.
(Oh, and did I mention that both,
cargo-llvm-lines were built by
David Tolnay, who is a frickin' steamroller of an
Compile On A Beefy Machine
On portable devices, compiling can drain your battery and be slow. To avoid that, I'm using my machine at home, a 6-core AMD FX 6300 with 12GB RAM, as a build machine. I can use it in combination with Visual Studio Code Remote Development.
If you don't have a dedicated machine yourself, you can compile in the cloud
Gitpod.io is superb for testing a cloud build as they provide you with a beefy machine (currently 16 core Intel Xeon 2.30GHz, 60GB RAM) for free during a limited period. Simply add
front of any Github repository URL.
Here is an example for one of my Hello Rust episodes.
When it comes to buying dedicated hardware, here are some tips. Generally, you should get a proper multicore CPU like an AMD Ryzen Threadripper plus at least 32 GB of RAM.
Drastic Measures: Overclock Your CPU? 🔥
⚠️ Warning: You can damage your hardware if you don't know what you are doing. Proceed at your own risk.
Here's an idea for the desperate. Now I don't recommend that to everyone, but if you have a standalone desktop computer with a decent CPU, this might be a way to squeeze out the last bits of performance.
Even though the Rust compiler executes a lot of steps in parallel, single-threaded performance is still quite relevant.
As a somewhat drastic measure, you can try to overclock your CPU. Here's a tutorial for my processor. (I owe you some benchmarks from my machine.)
Download ALL The Crates
If you have a slow internet connection, a big part of the initial build process is fetching all those shiny crates from crates.io. To mitigate that, you can download all crates in advance to cache them locally. criner does just that:
git clone https://github.com/the-lean-crate/criner cd criner cargo run --release -- mine
The archive size is surprisingly reasonable, with roughly 50GB of required disk space.
Help Others: Upload Leaner Crates For Faster Build Times
cargo-diet helps you build
lean crates that significantly reduce download size (sometimes by 98%). It might
not directly affect your own build time, but your users will surely be thankful. 😊
Phew! That was a long list. If you have any additional tips, please let me know.
If compiler performance is something you're interested in, why not collaborate on a tool to see what user code is causing rustc to use lots of time?
Thanks for reading! If you would like to receive future posts automatically, you can subscribe via email or RSS.
Want to read more stories? Sponsor