Of Boxes and Trees - Smart Pointers in Rust
Recently, I tried to implement a binary tree data structure in Rust. Each binary tree has a root value, a left, and a right subtree. I started from this Python implementation, which is quite straightforward.
=
=
=
This allows us to declare a fancy tree object like this:
=
And the result can be visualized beautifully.
(Yes I’ve drawn that myself.)
Porting that code to Rust turned out to be a little… challenging. My first attempt looked quite innocuous.
That’s pretty much a one-to-one translation of the Python definition — but rustc
says no.
error: recursive type `Tree` has infinite size
-/main.rs:1:1
|
1 |
Coming from memory-managed languages (like Python, PHP, or Ruby), I was confused by this. The problem is easy to understand, though. Computers have a limited amount of memory. It’s the compiler’s job to find out how much memory to allocate for each item.
In our case, it infers the following:
A tree is a structure containing an i64
, and two trees. Each of these trees is a structure containing an i64
, and two trees. Each of these…
You get the idea.
Tree
Tree
// The next expansion won't fit on the page anymore
Since we don’t know how many subtrees our tree will have, there is no way to tell how much memory we need to allocate up front. We’ll only know at runtime!
Rust tells us how to fix that: by inserting an indirection like Box
, Rc
, or &
. These are different “pointer types” in Rust. They all point to places in memory. So, instead of knowing the total size of our tree structure, we just know the point in memory where the tree is located. But that’s enough to define the tree structure. These pointer types allow us to do that safely and without manual memory management. They all offer different guarantees and you should choose the one that fits your requirements best.
&
is called aborrow
in Rust speech. It’s the most common of the three. It’s a reference to some place in memory, but it does not own the data it points to. As such, the lifetime of the borrow depends on its owner. Therefore we would need to add lifetime parameters here. This can make it tedious to use.Box
is a smart pointer with zero runtime overhead. It owns the data it points to and stores it on the heap. We call it smart because when it goes out of scope, it will first drop the data it points to and then itself. No manual memory management required, which is neat. ✨Rc
is another smart pointer. It’s short for “reference-counting”. It keeps track of the number of references to the data structure internally. As soon as the number of references is down to zero, it cleans up after itself. ChooseRc
if you need to have multiple owners of the same data in one thread. For multithreading, there’s alsoArc
(atomic reference count).
Putting the tree into a box
All three options are totally valid. Which one you should choose, depends on your use-case. A rule of thumb is to keep it simple. In my case, I chose to use a Box
, because I did not need any special guarantees.
Making subtrees optional
The next problem I faced was that I could not instantiate a tree structure. The left and right subtree have the type Box<Tree>
, but at some point I would need an empty subtree.
In the Python example, I used None
to signal the end of my data structure. Thanks to Rust’s Option
type we can do the same:
After all of this, we can create our first tree:
Tree ;
Depending on your point of view, you might say this is either verbose or explicit. Compared to the Python version, it looked a bit too cluttered for my taste.
Can we do better? Chris McDonald helped me come up with the following representation:
new
.left
.right;
To me, this is much easier on the eye.
Here’s the full tree implementation that makes this possible:
Update: Danny Grein mentioned on Twitter, that we can support the following syntax by implementing From<i64> for Tree
:
root
.left
.right;
Why did it just work in Python?
Now you might be wondering why our tree implementation worked so flawlessly in Python. The reason is that Python dynamically allocates memory for the tree object at runtime. Also, it wraps everything inside a PyObject, which is kind of similar to Rc
from above — a reference-counted smart pointer.
Rust is more explicit here. It gives us more flexibility to express our needs but we also need to know about all the possible alternatives to make good use of them. My advice is to stay away from smart pointers if a simple borrow will do.
If lifetimes get in the way however or you need additional guarantees like thread-safety, smart pointers are a great addition to your toolkit. The Rust documentation is a good starting point to learn more about smart pointers. Also, read “Idiomatic tree and graph-like structures in Rust” for some clever use of allocators in case your tree needs to be mutable at runtime.
- 💬 Comments on Reddit.
Thanks for reading! I mostly write about Rust and my (open-source) projects. If you would like to receive future posts automatically, you can subscribe via RSS.