Tips for Faster Rust Compile Times
When it comes to runtime performance, Rust is one of the fastest guns in the west. ๐ซ It is on par with the likes of C and C++ and sometimes even surpasses those. Compile times, however? That's another story.
Below is a list of tips and tricks on how to make your Rust project compile faster today. They are roughly ordered by practicality, so start at the top and work your way down until you're happy and your compiler goes brrrrrrr.Table of Contents
Why Is Rust Compilation Slow?
Wait a sec, slow in comparison to what? That is, if you compare Rust with Go, the Go compiler is doing a lot less work in general. For example, it lacks support for generics and macros. On top of that, the Go compiler was built from scratch as a monolithic toolchain consisting of both, the frontend and the backend (rather than relying on, say, LLVM to take over the backend part, which is the case for Rust or Swift). This has advantages (more flexibility when tweaking the entire compilation process, yay) and disadvantages (higher overall maintenance cost and fewer supported architectures).
In general, comparing across different programming languages makes little sense and overall, the Rust compiler is legitimately doing a great job. That said, above a certain project size, the compile times are... let's just say they could be better.
If you like to know what's slowing down your builds, run
cargo build --timings
This will generate a report on how much time was spent on each step involved in compiling your program. Here's the output:
Source: Mara Bos via Twitter
Why Bother?
According to the Rust 2019 survey, improving compile times is #4 on the Rust wishlist:
Compile-Time vs Runtime Performance
As is often cautioned in debates among their designers, programming language design is full of tradeoffs. One of those fundamental tradeoffs is runtime performance vs. compile-time performance, and the Rust team nearly always (if not always) chose runtime over compile-time.
โ Brian Anderson
Overall, there are a few features and design decisions that limit Rust compilation speed:
- Macros: Code generation with macros can be quite expensive.
- Type checking
- Monomorphization: this is the process of generating specialized versions of generic functions. E.g., a function that takes an
Into<String>
gets converted into one that takes aString
and one that takes a&str
. - LLVM: that's the default compiler backend for Rust, where a lot of the heavy-lifting (like code-optimizations) takes place. LLVM is notorious for being slow.
- Linking: Strictly speaking, this is not part of compiling but happens right after. It "connects" your Rust binary with the system libraries.
cargo
does not explicitly mark the linking step, so many people add it to the overall compilation time.
If you're interested in all the gory details, check out this blog post by Brian Anderson.
Update The Rust Compiler And Toolchain
Making the Rust compiler faster is an ongoing process, and many fearless people are working on it. Thanks to their hard work, compiler speed has improved 30-40% across the board year-to-date, with some projects seeing up to 45%+ improvements.
So make sure you use the latest Rust version:
rustup update
On top of that, Rust tracks compile regressions on a website dedicated to performance. Work is also put into optimizing the LLVM backend. Rumor has it that there's still a lot of low-hanging fruit. ๐
Use cargo check
Instead Of cargo build
Most of the time, you don't even have to compile your project at all; you just want to know if you messed up somewhere. Whenever you can, skip compilation altogether. What you need instead is laser-fast code linting, type- and borrow-checking.
For that, cargo has a special treat for you: โจ cargo check
โจ. Consider the differences in the number of instructions between cargo check
on the left and cargo debug
in the middle. (Pay attention to the different scales.)
A sweet trick I use is to run it in the background with cargo watch
. This way, it will cargo check
whenever you change a file.
โญ Pro-tip: Use cargo watch -c
to clear the screen before every run.
Use Rust Analyzer Instead Of Rust Language Server (RLS)
Another quick way to check if you set the codebase on fire is to use a "language server". That's basically a "linter as a service", that runs next to your editor.
For a long time, the default choice here was RLS, but lately, folks moved over to rust-analyzer, because it's more feature-complete and way more snappy. It supports all major IDEs. Switching to that alone might save your day.
Remove Unused Dependencies
So let's say you tried all of the above and find that compilation is still slow. What now?
Dependencies sometimes become obsolete thanks to refactoring. From time to time it helps to check if all of them are still needed to save compile time.
If this is your own project (or a project you like to contribute to), do a quick check if you can toss anything with cargo-udeps:
&&
There also is a newer tool called cargo-machete, which does the same thing but does not require a nightly compiler. It also works better with workspaces.
Update Remaining Dependencies
Next, update your dependencies, because they themselves could have tidied up their dependency tree lately.
Take a deep dive with cargo-outdated
or cargo tree
(built right into cargo itself) to find any outdated dependencies. On top of that, use cargo audit
to get notified about any vulnerabilities which need to be addressed, or deprecated crates which need a replacement.
Here's a nice workflow that I learned from /u/oherrala on Reddit:
- Run
cargo update
to update to the latest semver compatible version. - Run
cargo outdated -wR
to find newer, possibly incompatible dependencies. Update those and fix code as needed. - Find duplicate versions of a dependency and figure out where they come from:
cargo tree --duplicate
shows dependencies which come in multiple versions.
(Thanks to /u/dbdr for pointing this out.)
โญ Pro-tip: Step 3 is a great way to contribute back to the community! Clone the repository and execute steps 1 and 2. Finally, send a pull request to the maintainers.
Replace Heavy Dependencies
From time to time, it helps to shop around for more lightweight alternatives to popular crates.
Again, cargo tree
is your friend here to help you understand which of your dependencies are quite heavy: they require many other crates, cause excessive network I/O and slow down your build. Then search for lighter alternatives.
Also, cargo-bloat
has a --time
flag that shows you the per-crate build time. Very handy!
Here are a few examples:
- Using serde? Check out miniserde and maybe even nanoserde.
- reqwests is quite heavy. Maybe try attohttpc or ureq, which are more lightweight.
tokio dragging you down? How about smol?
(Edit: This won't help much with build times. More info in this discussion on Reddit)- Swap out clap with pico-args if you only need basic argument parsing.
Here's an example where switching crates reduced compile times from 2:22min to 26 seconds.
Use Cargo Workspaces
Cargo has that neat feature called workspaces, which allow you to split one big crate into multiple smaller ones. This code-splitting is great for avoiding repetitive compilation because only crates with changes have to be recompiled. Bigger projects like servo and vector are using workspaces heavily to slim down compile times. Learn more about workspaces here.
Use Cargo Nextest For Faster Test Execution
It's nice that cargo
comes with its own little test runner, but especially if you have to build multiple test binaries, cargo nextest
can be up to 60% faster than cargo test
thanks to its parallel execution model. Here are some quick benchmarks:
Project | cargo test (s) | nextest (s) | Difference |
---|---|---|---|
meilisearch | 41.04 | 20.62 | -49.8% |
rust-analyzer | 6.76 | 5.23 | -22.6% |
tokio | 27.16 | 11.72 | -56.8% |
You can try it with
cargo install cargo-nextest
cargo nextest run
Combine All Integration Tests In A Single Binary
Have any integration tests? (These are the ones in your tests
folder.) Did you know that the Rust compiler will create a binary for every single one of them? And every binary will have to be linked individually. This can take most of your build time because linking is slooow. ๐ข The reason is that many system linkers (like ld
) are single threaded.
๐จโ๐ณ๏ธ๐กโ๏ธ A linker is a tool that combines the output of a compiler and mashes that into one executable you can run.
To make the linker's job a little easier, you can put all your tests in one crate. (Basically create a main.rs
in your test folder and add your test files as mod
in there.)
Then the linker will go ahead and build a single binary only. Sounds nice, but careful: it's still a trade-off as you'll need to expose your internal types and functions (i.e. make them pub
).
Might be worth a try, though because a recent benchmark revealed a 1.9x speedup for one project.
This tip was brought to you by Luca Palmieri, Lucio Franco, and Azriel Hoh. Thanks!
Disable Unused Features Of Crate Dependencies
Check the feature flags of your dependencies. A lot of library maintainers take the effort to split their crate into separate features that can be toggled off on demand. Maybe you don't need all the default functionality from every crate?
For example, tokio
has a ton of features that you can disable if not needed.
Another example is bindgen, which enables clap
support by default for its binary usage. This isn't needed for library usage, which is the common use-case. Disabling that feature improved compile time of rust-rocksdb by ~13s and ~9s for debug and release builds respectively. Thanks to reader Lilian Anatolie Moraru for mentioning this.
โ ๏ธ Fair warning: it seems that switching off features doesn't always improve compile time. (See tikv's experiences here.) It may still be a good idea for improving security by reducing the code's attack surface.
A quick way to list all features of a crate is cargo-feature-set. As of recently, you also get a list of features of a crate when installing it with cargo add
.
If you want to look up the feature flags of a crate, they are listed on docs.rs. E.g. check out tokio's feature flags.
Use A Ramdisk For Compilation
๐พ Skip this tip if you're using an SSD.
When starting to compile heavy projects, I noticed that I was throttled on I/O. The reason was that I kept my projects on a measly HDD. A more performant alternative would be SSDs, but if that's not an option, don't throw in the sponge just yet.
Ramdisks to the rescue! These are like "virtual harddisks" that live in system memory.
User moschroe_de shared the following snippet over on Reddit, which creates a ramdisk for your current Rust project (on Linux):
&& \
&& \
| |
On macOS, you could probably do something similar with this script. I haven't tried that myself, though.
Cache Dependencies With sccache
Another neat project is sccache by Mozilla, which caches compiled crates to avoid repeated compilation.
I had this running on my laptop for a while, but the benefit was rather negligible, to be honest. It works best if you work on a lot of independent projects that share dependencies (in the same version). A common use-case is shared build servers.
Cranelift โ The Alternative Rust Compiler
Lately, I was excited to hear that the Rust project is using an alternative compiler that runs in parallel with rustc
for every CI build: Cranelift, also called CG_CLIF
.
Here is a comparison between rustc
and Cranelift for some popular crates (blue means better):
Somewhat unbelieving, I tried to compile vector with both compilers.
The results were astonishing:
- Rustc: 5m 45s
- Cranelift: 3m 13s
I could really notice the difference! What's cool about this is that it creates fully working executable binaries. They won't be optimized as much, but they are great for local development.
A more detailed write-up is on Jason Williams' page, and the project code is on Github.
Switch To A Faster Linker
The thing that nobody seems to target is linking time. For me, when using something with a big dependency tree like Amethyst, for example linking time on my fairly recent Ryzen 7 1700 is ~10s each time, even if I change only some minute detail only in my code. โ /u/Almindor on Reddit
You can check how long your linker takes by running the following commands:
cargo clean
cargo +nightly rustc --bin <your_binary_name> -- -Z time-passes
It will output the timings of each step, including link time:
...
time: 0.000 llvm_dump_timing_file
time: 0.001 serialize_work_products
time: 0.002 incr_comp_finalize_session_directory
time: 0.004 link_binary_check_files_are_writeable
time: 0.614 run_linker
time: 0.000 link_binary_remove_temps
time: 0.620 link_binary
time: 0.622 link_crate
time: 0.757 link
time: 3.836 total
Finished dev [unoptimized + debuginfo] target(s) in 42.75s
If the link
steps account for a big percentage of the build time, consider switching over to a different linker. There are quite a few options.
According to the official documentation, "LLD is a linker from the LLVM project that is a drop-in replacement for system linkers and runs much faster than them. [..] When you link a large program on a multicore machine, you can expect that LLD runs more than twice as fast as the GNU gold linker. Your mileage may vary, though."
If you're on Linux you can switch to lld
like so:
[]
= [
"-C", "link-arg=-fuse-ld=lld",
]
A word of caution: lld
might not be working on all platforms yet. At least on macOS, Rust support seems to be broken at the moment, and the work on fixing it has stalled (see rust-lang/rust#39915).
Update: I recently learned about another linker called mold, which claims a massive 12x performance bump over lld. Compared to GNU gold, it's said to be more than 50x. Would be great if anyone could verify and send me a message.
Update II: Aaand another one called zld, which is a drop-in replacement for Apple's ld
linker and is targeting debug builds. [Source]
Update III: zld is deprecated. The author recommends using lld instead. You can read up on the backstory here.
Which one you want to choose depends on your requirements. Which platforms do you need to support? Is it just for local testing or for production usage?
mold
is optimized for Linux, zld
only works on macOS. For production use, lld
might be the most mature option.
Faster Incremental Debug Builds On Macos
Rust 1.51 added an interesting flag for faster incremental debug builds on macOS. It can make debug builds up to seconds faster (depending on your use-case). Just add this to your Cargo.toml
:
[]
= "unpacked"
Some engineers report that this flag alone reduces compilation times on macOS by 70%.
The flag might become the standard for macOS soon. It is already the default on nightly.
Tweak More Codegen Options / Compiler Flags
Rust comes with a huge set of settings for code generation. It can help to look through the list and tweak the parameters for your project.
There are many gems in the full list of codegen options. For inspiration, here's bevy's config for faster compilation.
Profile Compile Times
If you like to dig deeper than cargo --timings
, Rust compilation can be profiled with cargo rustc -- -Zself-profile
. The resulting trace file can be visualized with a flamegraph or the Chromium profiler:
Source: Rust Lang Blog
Another golden one is cargo-llvm-lines
, which shows the number of lines generated and objects copied in the LLVM backend:
$ cargo llvm-lines | head -20
Lines Copies Function name
----- ------ -------------
30737 (100%) 1107 (100%) (TOTAL)
1395 (4.5%) 83 (7.5%) core::ptr::drop_in_place
760 (2.5%) 2 (0.2%) alloc::slice::merge_sort
734 (2.4%) 2 (0.2%) alloc::raw_vec::RawVec<T,A>::reserve_internal
666 (2.2%) 1 (0.1%) cargo_llvm_lines::count_lines
490 (1.6%) 1 (0.1%) <std::process::Command as cargo_llvm_lines::PipeTo>::pipe_to
476 (1.5%) 6 (0.5%) core::result::Result<T,E>::map
440 (1.4%) 1 (0.1%) cargo_llvm_lines::read_llvm_ir
422 (1.4%) 2 (0.2%) alloc::slice::merge
399 (1.3%) 4 (0.4%) alloc::vec::Vec<T>::extend_desugared
388 (1.3%) 2 (0.2%) alloc::slice::insert_head
366 (1.2%) 5 (0.5%) core::option::Option<T>::map
304 (1.0%) 6 (0.5%) alloc::alloc::box_free
296 (1.0%) 4 (0.4%) core::result::Result<T,E>::map_err
295 (1.0%) 1 (0.1%) cargo_llvm_lines::wrap_args
291 (0.9%) 1 (0.1%) core::char::methods::<impl char>::encode_utf8
286 (0.9%) 1 (0.1%) cargo_llvm_lines::run_cargo_rustc
284 (0.9%) 4 (0.4%) core::option::Option<T>::ok_or_else
Avoid Procedural Macro Crates
Procedural macros are the hot sauce of Rust development: they burn through CPU cycles so use with care (keyword: monomorphization).
Update: Over on Twitter Manish pointed out that "the reason proc macros are slow is that the (excellent) proc macro infrastructure โ syn
and friends โ are slow to compile. Using proc macros themselves does not have a huge impact on compile times." (This might change in the future.)
Manish goes on to say
This basically means that if you use one proc macro, the marginal compile time cost of adding additional proc macros is insignificant. A lot of people end up needing serde in their deptree anyway, so if you are forced to use serde, you should not care about proc macros.
If you are not forced to use serde, one thing a lot of folks do is have
serde
be an optional dependency so that their types are still serializable if necessary.
If you heavily use procedural macros in your project (e.g., if you use serde), it might be worth it to play around with opt-levels in your Cargo.toml
.
[]
= 3
As reader jfmontanaro mentioned on Github:
I think the reason it helps with build times is because it only applies to build scripts and proc-macros. Build scripts and proc-macros are unique because during a normal build, they are not only compiled but also executed (and in the case of proc-macros, they can be executed repeatedly). When your project uses a lot of proc-macros, optimizing the macros themselves can in theory save a lot of time.
Another approach is to try and sidestep the macro impact on compile times with watt, a tool that offloads macro compilation to Webassembly.
From the docs:
By compiling macros ahead-of-time to Wasm, we save all downstream users of the macro from having to compile the macro logic or its dependencies themselves.
Instead, what they compile is a small self-contained Wasm runtime (~3 seconds, shared by all macros) and a tiny proc macro shim for each macro crate to hand off Wasm bytecode into the Watt runtime (~0.3 seconds per proc-macro crate you depend on). This is much less than the 20+ seconds it can take to compile complex procedural macros and their dependencies.
Note that this crate is still experimental.
(Oh, and did I mention that both, watt
and cargo-llvm-lines
were built by David Tolnay, who is a frickin' steamroller of an engineer?)
Get Dedicated Hardware
If you reached this point, the easiest way to improve compile times even more is probably to spend money on top-of-the-line hardware.
Perhaps a bit surprisingly, the fastest machines for Rust compiles seem to be Apple machines with an M1 chip:
The benchmarks for the new Macbook Pro with M1 Max are absolutely ridiculous โ even in comparison to the already fast M1:
Project | M1 Max | M1 Air |
---|---|---|
Deno | 6m11s | 11m15s |
MeiliSearch | 1m28s | 3m36s |
bat | 43s | 1m23s |
hyperfine | 23s | 42s |
ripgrep | 16s | 37s |
That's a solid 2x performance improvement.
But if you rather like to stick to Linux, people also had great success with a multicore CPU like an AMD Ryzen Threadripper and 32 GB of RAM.
On portable devices, compiling can drain your battery and be slow. To avoid that, I'm using my machine at home, a 6-core AMD FX 6300 with 12GB RAM, as a build machine. I can use it in combination with Visual Studio Code Remote Development.
Compile in the Cloud
If you don't have a dedicated machine yourself, you can offload the compilation process to the cloud instead.
Gitpod.io is superb for testing a cloud build as they provide you with a beefy machine (currently 16 core Intel Xeon 2.80GHz, 60GB RAM) for free during a limited period. Simply add https://gitpod.io/#
in front of any Github URL. Here is an example for one of my Hello Rust episodes.
Gitpod has a neat feature called prebuilds. From their docs:
Whenever your code changes (e.g. when new commits are pushed to your repository), Gitpod can prebuild workspaces. Then, when you do create a new workspace on a branch, or Pull/Merge Request, for which a prebuild exists, this workspace will load much faster, because all dependencies will have been already downloaded ahead of time, and your code will be already compiled.
Especially when reviewing pull requests, this could give you a nice speedup. Prebuilds are quite customizable; take a look at the .gitpod.yml
config of nushell to get an idea.
Download ALL The Crates
If you have a slow internet connection, a big part of the initial build process is fetching all those shiny crates from crates.io. To mitigate that, you can download all crates in advance to have them cached locally. criner does just that:
git clone https://github.com/the-lean-crate/criner
cd criner
cargo run --release -- mine
The archive size is surprisingly reasonable, with roughly 50GB of required disk space (as of today).
Bonus: Speed Up Rust Docker Builds ๐ณ
Building Docker images from your Rust code? These can be notoriously slow, because cargo doesn't support building only a project's dependencies yet, invalidating the Docker cache with every build if you don't pay attention. cargo-chef
to the rescue! โก
cargo-chef
can be used to fully leverage Docker layer caching, therefore massively speeding up Docker builds for Rust projects. On our commercial codebase (~14k lines of code, ~500 dependencies) we measured a 5x speed-up: we cut Docker build times from ~10 minutes to ~2 minutes.
Here is an example Dockerfile if you're interested:
# Step 1: Compute a recipe file
FROM rust as planner
WORKDIR app
RUN cargo install cargo-chef
COPY . .
RUN cargo chef prepare --recipe-path recipe.json
# Step 2: Cache project dependencies
FROM rust as cacher
WORKDIR app
RUN cargo install cargo-chef
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json
# Step 3: Build the binary
FROM rust as builder
WORKDIR app
COPY . .
# Copy over the cached dependencies from above
COPY --from=cacher /app/target target
COPY --from=cacher /usr/local/cargo /usr/local/cargo
RUN cargo build --release --bin app
# Step 4:
# Create a tiny output image.
# It only contains our final binary.
FROM rust as runtime
WORKDIR app
COPY --from=builder /app/target/release/app /usr/local/bin
ENTRYPOINT ["/usr/local/bin/app"]
cargo-chef
can help speed up your continuous integration with Github Actions or your deployment process to Google Cloud.
Drastic Measures: Overclock Your CPU? ๐ฅ
โ ๏ธ Warning: You can damage your hardware if you don't know what you are doing. Proceed at your own risk.
Here's an idea for the desperate. Now I don't recommend that to everyone, but if you have a standalone desktop computer with a decent CPU, this might be a way to squeeze out the last bits of performance.
Even though the Rust compiler executes a lot of steps in parallel, single-threaded performance is still quite relevant.
As a somewhat drastic measure, you can try to overclock your CPU. (I owe you some benchmarks from my machine.)
Speeding Up Your CI Builds
If you collaborate with others on a Rust project, chances are you use some sort of continuous integration like Github Actions. Optimizing a CI build processes is a whole subject on its own. Thankfully Aleksey Kladov (matklad) collected a few tips on his blog. He touches on bors, caching, splitting build steps, disabling compiler features like incremental compilation or debug output, and more. It's a great read and you can find it here.
Upstream Work
Making the Rust compiler faster is an ongoing process, and many fearless people are working on it. Thanks to their hard work, compiler speed has improved 30-40% across the board year-to-date, with some projects seeing up to 45%+ improvements. On top of that, Rust tracks compile regressions on a website dedicated to performance
Work is also put into optimizing the LLVM backend. Rumor has it that there's still a lot of low-hanging fruit. ๐
The Rust team is also continuously working to make the compiler faster. Here's an extract of the 2020 survey:
One continuing topic of importance to the Rust community and the Rust team is improving compile times. Progress has already been made with 50.5% of respondents saying they felt compile times have improved. This improvement was particularly pronounced with respondents with large codebases (10,000 lines of code or more) where 62.6% citing improvement and only 2.9% saying they have gotten worse. Improving compile times is likely to be the source of significant effort in 2021, so stay tuned!
Help Others: Upload Leaner Crates For Faster Build Times
cargo-diet
helps you build lean crates that significantly reduce download size (sometimes by 98%). It might not directly affect your own build time, but your users will surely be thankful. ๐
More Resources
- The Rust Perf Book has a section on compile times.
- Tons of articles on performance on Read Rust.
- 8 Solutions for Troubleshooting Your Rust Build Times is a great article by Dotan Nahum that I fully agree with.
- Improving the build times of a bigger Rust project (lemmy) by 30%.
- arewefastyet (offline) measures how long the Rust compiler takes to compile common Rust programs.
What's Next?
My company, corrode, can help you with performance problems and reducing your build times. Reach out here.
Phew! That was a long list. ๐ If you have any additional tips, please let me know.
If compiler performance is something you're interested in, why not collaborate on a tool to see what user code is causing rustc to use lots of time?
Also, once you're done optimizing your build times, how about optimizing runtimes next? My friend Pascal Hertleif has a nice article on that.
Thanks for reading! I mostly write about Rust and my (open-source) projects. If you would like to receive future posts automatically, you can subscribe via RSS or email:
Sponsor me on Github My Amazon wish list
Thanks to DocWilco, Hendrik Maus, Luca Pizzamiglio for reviewing drafts of this article.