Tips for Faster Rust Compile Times
When it comes to runtime performance, Rust is one of the fastest guns in the west. 🔫 It is on par with the likes of C and C++ and sometimes even surpasses those. Compile times, however? That's another story.
Below is a list of tips and tricks on how to make your Rust project compile faster today. They are roughly ordered by practicality, so start at the top and work your way down until you're happy and your compiler goes brrrrrrr.
Table of Contents
Wait a sec, slow in comparison to what? That is, if you compare Rust with Go, the Go compiler is doing a lot less work in general. For example, it lacks support for generics and macros. On top of that, the Go compiler was built from scratch as a monolithic toolchain consisting of both, the frontend and the backend (rather than relying on, say, LLVM to take over the backend part, which is the case for Rust or Swift). This has advantages (more flexibility when tweaking the entire compilation process, yay) and disadvantages (higher overall maintenance cost and fewer supported architectures).
In general, comparing across different programming languages makes little sense and overall, the Rust compiler is legitimately doing a great job. That said, above a certain project size, the compile times are... let's just say they could be better.
If you like to know what's slowing down your builds, run
This will generate a report on how much time was spent on each step involved in compiling your program. Here's the output:
According to the Rust 2019 survey, improving compile times is #4 on the Rust wishlist:
As is often cautioned in debates among their designers, programming language design is full of tradeoffs. One of those fundamental tradeoffs is runtime performance vs. compile-time performance, and the Rust team nearly always (if not always) chose runtime over compile-time.
— Brian Anderson
Overall, there are a few features and design decisions that limit Rust compilation speed:
- Macros: Code generation with macros can be quite expensive.
- Type checking
- Monomorphization: this is the process of generating specialized versions of generic functions. E.g., a function that takes an
Into<String>gets converted into one that takes a
Stringand one that takes a
- LLVM: that's the default compiler backend for Rust, where a lot of the heavy-lifting (like code-optimizations) takes place. LLVM is notorious for being slow.
- Linking: Strictly speaking, this is not part of compiling but happens right after. It "connects" your Rust binary with the system libraries.
cargodoes not explicitly mark the linking step, so many people add it to the overall compilation time.
If you're interested in all the gory details, check out this blog post by Brian Anderson.
Making the Rust compiler faster is an ongoing process, and many fearless people are working on it. Thanks to their hard work, compiler speed has improved 30-40% across the board year-to-date, with some projects seeing up to 45%+ improvements.
So make sure you use the latest Rust version:
Most of the time, you don't even have to compile your project at all; you just want to know if you messed up somewhere. Whenever you can, skip compilation altogether. What you need instead is laser-fast code linting, type- and borrow-checking.
For that, cargo has a special treat for you: ✨
cargo check ✨. Consider the differences in the number of instructions between
cargo check on the left and
cargo debug in the middle. (Pay attention to the different scales.)
A sweet trick I use is to run it in the background with
cargo watch. This way, it will
cargo check whenever you change a file.
⭐ Pro-tip: Use
cargo watch -c to clear the screen before every run.
Another quick way to check if you set the codebase on fire is to use a "language server". That's basically a "linter as a service", that runs next to your editor.
For a long time, the default choice here was RLS, but lately, folks moved over to rust-analyzer, because it's more feature-complete and way more snappy. It supports all major IDEs. Switching to that alone might save your day.
So let's say you tried all of the above and find that compilation is still slow. What now?
Dependencies sometimes become obsolete thanks to refactoring. From time to time it helps to check if all of them are still needed to save compile time.
If this is your own project (or a project you like to contribute to), do a quick check if you can toss anything with cargo-udeps:
cargo install cargo-udeps && cargo +nightly udeps
Next, update your dependencies, because they themselves could have tidied up their dependency tree lately.
Take a deep dive with
cargo tree (built right into cargo itself) to find any outdated dependencies. On top of that, use
cargo audit to get notified about any vulnerabilities which need to be addressed, or deprecated crates which need a replacement.
Here's a nice workflow that I learned from /u/oherrala on Reddit:
cargo updateto update to the latest semver compatible version.
cargo outdated -wRto find newer, possibly incompatible dependencies. Update those and fix code as needed.
- Find duplicate versions of a dependency and figure out where they come from:
cargo tree --duplicateshows dependencies which come in multiple versions.
(Thanks to /u/dbdr for pointing this out.)
⭐ Pro-tip: Step 3 is a great way to contribute back to the community! Clone the repository and execute steps 1 and 2. Finally, send a pull request to the maintainers.
From time to time, it helps to shop around for more lightweight alternatives to popular crates.
cargo tree is your friend here to help you understand which of your dependencies are quite heavy: they require many other crates, cause excessive network I/O and slow down your build. Then search for lighter alternatives.
cargo-bloat has a
--time flag that shows you the per-crate build time. Very handy!
Here are a few examples:
- Using serde? Check out miniserde and maybe even nanoserde.
- reqwests is quite heavy. Maybe try attohttpc or ureq, which are more lightweight.
tokio dragging you down? How about smol?
(Edit: This won't help much with build times. More info in this discussion on Reddit)
- Swap out clap with pico-args if you only need basic argument parsing.
Here's an example where switching crates reduced compile times from 2:22min to 26 seconds.
Cargo has that neat feature called workspaces, which allow you to split one big crate into multiple smaller ones. This code-splitting is great for avoiding repetitive compilation because only crates with changes have to be recompiled. Bigger projects like servo and vector are using workspaces heavily to slim down compile times. Learn more about workspaces here.
It's nice that
cargo comes with its own little test runner, but especially if you have to build multiple test binaries,
cargo nextest can be up to 60% faster than
cargo test thanks to its parallel execution model. Here are some quick benchmarks:
|Project||cargo test (s)||nextest (s)||Difference|
You can try it with
cargo install cargo-nextest cargo nextest run
Have any integration tests? (These are the ones in your
tests folder.) Did you know that the Rust compiler will create a binary for every single one of them? And every binary will have to be linked individually. This can take most of your build time because linking is slooow. 🐢 The reason is that many system linkers (like
ld) are single threaded.
👨🍳️💡️ A linker is a tool that combines the output of a compiler and mashes that into one executable you can run.
To make the linker's job a little easier, you can put all your tests in one crate. (Basically create a
main.rs in your test folder and add your test files as
mod in there.)
Then the linker will go ahead and build a single binary only. Sounds nice, but careful: it's still a trade-off as you'll need to expose your internal types and functions (i.e. make them
Might be worth a try, though because a recent benchmark revealed a 1.9x speedup for one project.
Check the feature flags of your dependencies. A lot of library maintainers take the effort to split their crate into separate features that can be toggled off on demand. Maybe you don't need all the default functionality from every crate?
tokio has a ton of features that you can disable if not needed.
Another example is bindgen, which enables
clap support by default for its binary usage. This isn't needed for library usage, which is the common use-case. Disabling that feature improved compile time of rust-rocksdb by ~13s and ~9s for debug and release builds respectively. Thanks to reader Lilian Anatolie Moraru for mentioning this.
⚠️ Fair warning: it seems that switching off features doesn't always improve compile time. (See tikv's experiences here.) It may still be a good idea for improving security be reducing the code's attack surface.
A quick way to list all features of a crate is cargo-feature-set.
Admittedly, features are not very discoverable at the moment because there is no standard way to document them, but we'll get there eventually.
When starting to compile heavy projects, I noticed that I was throttled on I/O. The reason was that I kept my projects on a measly HDD. A more performant alternative would be SSDs, but if that's not an option, don't throw in the sponge just yet.
Ramdisks to the rescue! These are like "virtual harddisks" that live in system memory.
mkdir -p target && \ sudo mount -t tmpfs none ./target && \ cat /proc/mounts | grep "$(pwd)" | sudo tee -a /etc/fstab
On macOS, you could probably do something similar with this script. I haven't tried that myself, though.
Another neat project is sccache by Mozilla, which caches compiled crates to avoid repeated compilation.
I had this running on my laptop for a while, but the benefit was rather negligible, to be honest. It works best if you work on a lot of independent projects that share dependencies (in the same version). A common use-case is shared build servers.
Lately, I was excited to hear that the Rust project is using an alternative compiler that runs in parallel with
rustc for every CI build: Cranelift, also called
Here is a comparison between
rustc and Cranelift for some popular crates (blue means better):
Somewhat unbelieving, I tried to compile vector with both compilers.
The results were astonishing:
- Rustc: 5m 45s
- Cranelift: 3m 13s
I could really notice the difference! What's cool about this is that it creates fully working executable binaries. They won't be optimized as much, but they are great for local development.
The thing that nobody seems to target is linking time. For me, when using something with a big dependency tree like Amethyst, for example linking time on my fairly recent Ryzen 7 1700 is ~10s each time, even if I change only some minute detail only in my code. — /u/Almindor on Reddit
You can check how long your linker takes by running the following commands:
cargo clean cargo +nightly rustc --bin <your_binary_name> -- -Z time-passes
It will output the timings of each step, including link time:
... time: 0.000 llvm_dump_timing_file time: 0.001 serialize_work_products time: 0.002 incr_comp_finalize_session_directory time: 0.004 link_binary_check_files_are_writeable time: 0.614 run_linker time: 0.000 link_binary_remove_temps time: 0.620 link_binary time: 0.622 link_crate time: 0.757 link time: 3.836 total Finished dev [unoptimized + debuginfo] target(s) in 42.75s
link steps account for a big percentage of the build time, consider switching over to a different linker. There are quite a few options.
According to the official documentation, "LLD is a linker from the LLVM project that is a drop-in replacement for system linkers and runs much faster than them. [..] When you link a large program on a multicore machine, you can expect that LLD runs more than twice as fast as the GNU gold linker. Your mileage may vary, though."
If you're on Linux you can switch to
lld like so:
[target.x86_64-unknown-linux-gnu] rustflags = [ "-C", "link-arg=-fuse-ld=lld", ]
A word of caution:
lld might not be working on all platforms yet. At least on macOS, Rust support seems to be broken at the moment, and the work on fixing it has stalled (see rust-lang/rust#39915).
Update: I recently learned about another linker called mold, which claims a massive 12x performance bump over lld. Compared to GNU gold, it's said to be more than 50x. Would be great if anyone could verify and send me a message.
Which one you want to choose depends on your requirements. Which platforms do you need to support? Is it just for local testing or for production usage?
mold is optimized for Linux,
zld only works on macOS. For production use,
lld might be the most mature option.
Rust 1.51 added an interesting flag for faster incremental debug builds on macOS. It can make debug builds up to seconds faster (depending on your use-case). Just add this to your
[profile.dev] split-debuginfo = "unpacked"
Some engineers report that this flag alone reduces compilation times on macOS by 70%.
The flag might become the standard for macOS soon. It is already the default on nightly.
Rust comes with a huge set of settings for code generation. It can help to look through the list and tweak the parameters for your project.
If you like to dig deeper than
cargo --timings, Rust compilation can be profiled with
cargo rustc -- -Zself-profile. The resulting trace file can be visualized with a flamegraph or the Chromium profiler:
Another golden one is
cargo-llvm-lines, which shows the number of lines generated and objects copied in the LLVM backend:
$ cargo llvm-lines | head -20 Lines Copies Function name ----- ------ ------------- 30737 (100%) 1107 (100%) (TOTAL) 1395 (4.5%) 83 (7.5%) core::ptr::drop_in_place 760 (2.5%) 2 (0.2%) alloc::slice::merge_sort 734 (2.4%) 2 (0.2%) alloc::raw_vec::RawVec<T,A>::reserve_internal 666 (2.2%) 1 (0.1%) cargo_llvm_lines::count_lines 490 (1.6%) 1 (0.1%) <std::process::Command as cargo_llvm_lines::PipeTo>::pipe_to 476 (1.5%) 6 (0.5%) core::result::Result<T,E>::map 440 (1.4%) 1 (0.1%) cargo_llvm_lines::read_llvm_ir 422 (1.4%) 2 (0.2%) alloc::slice::merge 399 (1.3%) 4 (0.4%) alloc::vec::Vec<T>::extend_desugared 388 (1.3%) 2 (0.2%) alloc::slice::insert_head 366 (1.2%) 5 (0.5%) core::option::Option<T>::map 304 (1.0%) 6 (0.5%) alloc::alloc::box_free 296 (1.0%) 4 (0.4%) core::result::Result<T,E>::map_err 295 (1.0%) 1 (0.1%) cargo_llvm_lines::wrap_args 291 (0.9%) 1 (0.1%) core::char::methods::<impl char>::encode_utf8 286 (0.9%) 1 (0.1%) cargo_llvm_lines::run_cargo_rustc 284 (0.9%) 4 (0.4%) core::option::Option<T>::ok_or_else
Procedural macros are the hot sauce of Rust development: they burn through CPU cycles so use with care
Update: Over on Twitter Manish pointed out that "the reason proc macros are slow is that the (excellent) proc macro infrastructure –
syn and friends – are slow to compile. Using proc macros themselves does not have a huge impact on compile times." (This might change in the future.)
Manish goes on to say
This basically means that if you use one proc macro, the marginal compile time cost of adding additional proc macros is insignificant. A lot of people end up needing serde in their deptree anyway, so if you are forced to use serde, you should not care about proc macros.
If you are not forced to use serde, one thing a lot of folks do is have
serdebe an optional dependency so that their types are still serializable if necessary.
If you heavily use procedural macros in your project (e.g., if you use serde), you can try to sidestep their impact on compile times with watt, a tool that offloads macro compilation to Webassembly.
From the docs:
By compiling macros ahead-of-time to Wasm, we save all downstream users of the macro from having to compile the macro logic or its dependencies themselves.
Instead, what they compile is a small self-contained Wasm runtime (~3 seconds, shared by all macros) and a tiny proc macro shim for each macro crate to hand off Wasm bytecode into the Watt runtime (~0.3 seconds per proc-macro crate you depend on). This is much less than the 20+ seconds it can take to compile complex procedural macros and their dependencies.
Note that this crate is still experimental.
(Oh, and did I mention that both,
cargo-llvm-lines were built by David Tolnay, who is a frickin' steamroller of an engineer?)
If you reached this point, the easiest way to improve compile times even more is probably to spend money on top-of-the-line hardware.
Perhaps a bit surprisingly, the fastest machines for Rust compiles seem to be Apple machines with an M1 chip:
The benchmarks for the new Macbook Pro with M1 Max are absolutely ridiculous — even in comparison to the already fast M1:
|Project||M1 Max||M1 Air|
That's a solid 2x performance improvement compared to an already fast M1.
But if you rather like to stick to Linux, people also had great success with a multicore CPU like an AMD Ryzen Threadripper and 32 GB of RAM.
On portable devices, compiling can drain your battery and be slow. To avoid that, I'm using my machine at home, a 6-core AMD FX 6300 with 12GB RAM, as a build machine. I can use it in combination with Visual Studio Code Remote Development.
If you don't have a dedicated machine yourself, you can offload the compilation process to the cloud instead.
Gitpod.io is superb for testing a cloud build as they provide you with a beefy machine (currently 16 core Intel Xeon 2.80GHz, 60GB RAM) for free during a limited period. Simply add
https://gitpod.io/# in front of any Github URL. Here is an example for one of my Hello Rust episodes.
Gitpod has a neat feature called prebuilds. From their docs:
Whenever your code changes (e.g. when new commits are pushed to your repository), Gitpod can prebuild workspaces. Then, when you do create a new workspace on a branch, or Pull/Merge Request, for which a prebuild exists, this workspace will load much faster, because all dependencies will have been already downloaded ahead of time, and your code will be already compiled.
Especially when reviewing pull requests, this could give you a nice speedup. Prebuilds are quite customizable; take a look at the
.gitpod.yml config of nushell to get an idea.
If you have a slow internet connection, a big part of the initial build process is fetching all those shiny crates from crates.io. To mitigate that, you can download all crates in advance to have them cached locally. criner does just that:
git clone https://github.com/the-lean-crate/criner cd criner cargo run --release -- mine
The archive size is surprisingly reasonable, with roughly 50GB of required disk space (as of today).
Building Docker images from your Rust code? These can be notoriously slow, because cargo doesn't support building only a project's dependencies yet, invalidating the Docker cache with every build if you don't pay attention.
cargo-chef to the rescue! ⚡
cargo-chefcan be used to fully leverage Docker layer caching, therefore massively speeding up Docker builds for Rust projects. On our commercial codebase (~14k lines of code, ~500 dependencies) we measured a 5x speed-up: we cut Docker build times from ~10 minutes to ~2 minutes.
Here is an example Dockerfile if you're interested:
# Step 1: Compute a recipe file FROM rust as planner WORKDIR app RUN cargo install cargo-chef COPY . . RUN cargo chef prepare --recipe-path recipe.json # Step 2: Cache project dependencies FROM rust as cacher WORKDIR app RUN cargo install cargo-chef COPY --from=planner /app/recipe.json recipe.json RUN cargo chef cook --release --recipe-path recipe.json # Step 3: Build the binary FROM rust as builder WORKDIR app COPY . . # Copy over the cached dependencies from above COPY --from=cacher /app/target target COPY --from=cacher /usr/local/cargo /usr/local/cargo RUN cargo build --release --bin app # Step 4: # Create a tiny output image. # It only contains our final binary. FROM rust as runtime WORKDIR app COPY --from=builder /app/target/release/app /usr/local/bin ENTRYPOINT ["/usr/local/bin/app"]
cargo-chef can help speed up your continuous integration with Github Actions or your deployment process to Google Cloud.
⚠️ Warning: You can damage your hardware if you don't know what you are doing. Proceed at your own risk.
Here's an idea for the desperate. Now I don't recommend that to everyone, but if you have a standalone desktop computer with a decent CPU, this might be a way to squeeze out the last bits of performance.
Even though the Rust compiler executes a lot of steps in parallel, single-threaded performance is still quite relevant.
As a somewhat drastic measure, you can try to overclock your CPU. (I owe you some benchmarks from my machine.)
If you collaborate with others on a Rust project, chances are you use some sort of continuous integration like Github Actions. Optimizing a CI build processes is a whole subject on its own. Thankfully Aleksey Kladov (matklad) collected a few tips on his blog. He touches on bors, caching, splitting build steps, disabling compiler features like incremental compilation or debug output, and more. It's a great read and you can find it here.
Making the Rust compiler faster is an ongoing process, and many fearless people are working on it. Thanks to their hard work, compiler speed has improved 30-40% across the board year-to-date, with some projects seeing up to 45%+ improvements. On top of that, Rust tracks compile regressions on a website dedicated to performance
Work is also put into optimizing the LLVM backend. Rumor has it that there's still a lot of low-hanging fruit. 🍇
The Rust team is also continuously working to make the compiler faster. Here's an extract of the 2020 survey:
One continuing topic of importance to the Rust community and the Rust team is improving compile times. Progress has already been made with 50.5% of respondents saying they felt compile times have improved. This improvement was particularly pronounced with respondents with large codebases (10,000 lines of code or more) where 62.6% citing improvement and only 2.9% saying they have gotten worse. Improving compile times is likely to be the source of significant effort in 2021, so stay tuned!
cargo-diet helps you build lean crates that significantly reduce download size (sometimes by 98%). It might not directly affect your own build time, but your users will surely be thankful. 😊
- The Rust Perf Book has a section on compile times.
- Tons of articles on performance on Read Rust.
- 8 Solutions for Troubleshooting Your Rust Build Times is a great article by Dotan Nahum that I fully agree with.
- Improving the build times of a bigger Rust project (lemmy) by 30%.
- arewefastyet (offline) measures how long the Rust compiler takes to compile common Rust programs.
Phew! That was a long list. 😅 If you have any additional tips, please let me know.
If compiler performance is something you're interested in, why not collaborate on a tool to see what user code is causing rustc to use lots of time?
Thanks for reading! I mostly write about Rust and my (open-source) projects. If you would like to receive future posts automatically, you can subscribe via RSS or email: