Of Boxes and Trees - Smart Pointers in Rust
Recently, I tried to implement a binary tree data structure in Rust. Each binary tree has a root value, a left, and a right subtree. I started from this Python implementation, which is quite straightforward.
= = =
This allows us to declare a fancy tree object like this:
And the result can be visualized beautifully.
(Yes I've drawn that myself.)
Porting that code to Rust turned out to be a little... challenging. My first attempt looked quite innocuous.
That's pretty much a one-to-one translation of the Python definition — but
rustc says no.
error: recursive type `Tree` has infinite size -/main.rs:1:1 | 1 |
Coming from memory-managed languages (like Python, PHP, or Ruby), I was confused by this. The problem is easy to understand, though. Computers have a limited amount of memory. It's the compiler's job to find out how much memory to allocate for each item.
In our case, it infers the following:
A tree is a structure containing an
i64, and two trees. Each of these trees is a structure containing an
i64, and two trees. Each of these...
You get the idea.
Tree // The next expansion won't fit on the page anymore Tree
Since we don't know how many subtrees our tree will have, there is no way to tell how much memory we need to allocate up front. We'll only know at runtime!
Rust tells us how to fix that: by inserting an indirection like
&. These are different "pointer types" in Rust. They all point to places in memory. So, instead of knowing the total size of our tree structure, we just know the point in memory where the tree is located. But that's enough to define the tree structure. These pointer types allow us to do that safely and without manual memory management. They all offer different guarantees and you should choose the one that fits your requirements best.
&is called a
borrowin Rust speech. It's the most common of the three. It's a reference to some place in memory, but it does not own the data it points to. As such, the lifetime of the borrow depends on its owner. Therefore we would need to add lifetime parameters here. This can make it tedious to use.
Boxis a smart pointer with zero runtime overhead. It owns the data it points to and stores it on the heap. We call it smart because when it goes out of scope, it will first drop the data it points to and then itself. No manual memory management required, which is neat. ✨
Rcis another smart pointer. It's short for "reference-counting". It keeps track of the number of references to the data structure internally. As soon as the number of references is down to zero, it cleans up after itself. Choose
Rcif you need to have multiple owners of the same data in one thread. For multithreading, there's also
Arc(atomic reference count).
All three options are totally valid. Which one you should choose, depends on your use-case. A rule of thumb is to keep it simple. In my case, I chose to use a
Box, because I did not need any special guarantees.
The next problem I faced was that I could not instantiate a tree structure. The left and right subtree have the type
Box<Tree>, but at some point I would need an empty subtree.
In the Python example, I used
None to signal the end of my data structure. Thanks to Rust's
Option type we can do the same:
After all of this, we can create our first tree:
Depending on your point of view, you might say this is either verbose or explicit. Compared to the Python version, it looked a bit too cluttered for my taste.
Can we do better? Chris McDonald helped me come up with the following representation:
.left .right; new
To me, this is much easier on the eye.
Here's the full tree implementation that makes this possible:
root .left .right;
Now you might be wondering why our tree implementation worked so flawlessly in Python. The reason is that Python dynamically allocates memory for the tree object at runtime. Also, it wraps everything inside a PyObject, which is kind of similar to
Rc from above — a reference-counted smart pointer.
Rust is more explicit here. It gives us more flexibility to express our needs but we also need to know about all the possible alternatives to make good use of them. My advice is to stay away from smart pointers if a simple borrow will do.
If lifetimes get in the way however or you need additional guarantees like thread-safety, smart pointers are a great addition to your toolkit. The Rust documentation is a good starting point to learn more about smart pointers. Also, read "Idiomatic tree and graph-like structures in Rust" for some clever use of allocators in case your tree needs to be mutable at runtime.
Thanks for reading! I mostly write about Rust and my (open-source) projects. If you would like to receive future posts automatically, you can subscribe via RSS or email: