Launching a URL Shortener in Rust using Rocket

Tagged withdevossrust

One common systems design task in interviews is to sketch the software architecture of a URL shortener (a bit.ly clone, if you may). Since I was playing around with Rocket – a web framework for Rust – why not give it a try?

A rocket travelling through space
A rocket travelling through space

Requirements

A URL shortener has two main responsibilities:

  • Create a short URL for a longer one (d'oh!).
  • Redirect to the longer link when the short link is requested.

Let's call our service rust.ly (Hint, hint: the domain is still available at the time of writing...).

First, let's create a new Rust project:

cargo new --bin rustly

Next, we add Rocket to our Cargo.toml:

[dependencies]
rocket = "0.2.4"
rocket_codegen = "0.2.4"

Warning: Most likely you need to get the very newest Rocket version. Otherwise, you might get some... entertaining error messages. Find the newest version on crates.io.

Since Rocket requires cutting-edge Rust features, we need to use a recent nightly build. Rustup provides a simple way to switch between stable and nightly.

🤔 Nightly Rust might no longer be required. Has anyone tried without and can report back?

rustup update && rustup override set nightly

A first prototype

Now we can start coding our little service. First, let's write a simple "hello world" skeleton to get started. Put this into src/main.rs:

#![feature(plugin)]
#![plugin(rocket_codegen)]

extern crate rocket;

#[get("/<id>")]
fn lookup(id: &str) -> String {
    format!("⏩ You requested {}. Wonderful!", id)
}

#[get("/<url>")]
fn shorten(url: &str) -> String {
    format!("💾 You shortened {}. Magnificent!", url)
}

fn main() {
    rocket::ignite().mount("/", routes![lookup])
                    .mount("/shorten", routes![shorten])
                    .launch();
}

Under the hood, Rocket is doing some magic to enable this nice syntax. More specifically, we use the rocket_codegen crate for that.

In order to bring the rocket library into scope, we write extern crate rocket;.

We defined the two routes for our service. Both routes will respond to a GET request.
This is done by adding an attribute named get to a function. The attribute can take additional arguments. In our case, we define an id variable for the lookup endpoint and a url variable for the shorten endpoint. Both variables are Unicode string slices. Since Rust has awesome Unicode support, we respond with a nice emoji just to show off. 🕶

Lastly, we need a main function, which launches Rocket and mounts our two routes. This way, they become publicly available. If you want to know even more about the in-depth details, I may refer you to the official Rocket documentation.

Let's check if we're on the right track by running the application.

cargo run

After some compiling, you should get some lovely startup output from Rocket:

🔧  Configured for development.
    => address: localhost
    => port: 8000
    => log: normal
    => workers: 8
🛰  Mounting '/':
    => GET /<hash>
🛰  Mounting '/shorten':
    => GET /shorten/<url>
🚀  Rocket has launched from https://localhost:8000...

Sweet! Let's call our service.

> curl localhost:8000/shorten/www.endler.dev
💾 You shortened www.endler.dev. Magnificent!

> curl localhost:8000/www.endler.dev
⏩ You requested www.endler.dev. Wonderful!

So far so good.

Data storage and lookup

We need to keep the shortened URLs over many requests... but how? In a production scenario, we could use some NoSQL data store like Redis for that. Since the goal is to play with Rocket and learn some Rust, we will simply use an in-memory store.

Rocket has a that feature called managed state. In our case, we want to manage a repository of URLs.

First, let's create a file named src/repository.rs:

use std::collections::HashMap;
use shortener::Shortener;

pub struct Repository {
    urls: HashMap<String, String>,
    shortener: Shortener,
}

impl Repository {
    pub fn new() -> Repository {
        Repository {
            urls: HashMap::new(),
            shortener: Shortener::new(),
        }
    }

    pub fn store(&mut self, url: &str) -> String {
        let id = self.shortener.next_id();
        self.urls.insert(id.to_string(), url.to_string());
        id
    }

    pub fn lookup(&self, id: &str) -> Option<&String> {
        self.urls.get(id)
    }
}

Within this module we first import the HashMap implementation from the standard library. We also include shortener::Shortener;, which helps us shorten the URLs in the next step. Don't worry too much about that for now. By convention, we implement a new() method to create a Repository struct with an empty HashMap and a new Shortener. Additionally, we have two methods, store and lookup.

store takes a URL and writes it to our in-memory HashMap storage. It uses our yet-to-be-defined shortener to create a unique id. It returns the shortened ID for the entry. lookup gets a given ID from the storage, and returns it as an Option. If the ID is found, the return value will be Some(url); if there is no match it will return None.

Note that we convert the string slices (&str) to String using the to_string() method. This way we don't need to deal with lifetimes. As a beginner, don't think too hard about them.

Additional remarks (can safely be skipped)

A seasoned (Rust) developer™ might do a few things differently here. Did you notice the tight coupling between the repository and the shortener? In a production system, Repository and Shortener might simply be concrete implementations of traits (which are a bit like interfaces in other languages, but more powerful). For example, Repository could implement a Cache trait:

trait Cache {
    // Store an entry and return an ID
    fn store(&mut self, data: &str) -> String;
    // Look up a previously stored entry
    fn lookup(&self, id: &str) -> Option<&String>;
}

This way we get clear sepration of concerns, and we can easily switch to a different implementation (e.g. a RedisCache). Also, we could have a MockRepository to simplify testing. Same for Shortener.

On top of that, you might want to use the Into trait to support both, &str and String as parameters of store:

pub fn store<T: Into<String>>(&mut self, url: T) -> String {
		let id = self.shortener.shorten(url);
		self.urls.insert(id.to_owned(), url.into());
		id
}

If you're curious about this, read this article from Herman J. Radtke III. For now, let's keep it simple.

Actually shortening URLs

Let's implement the URL shortener itself. You might be surprised how much was written about URL shortening all over the web. One common way is to create short URLs using base 62 conversion.

After looking around some more, I found this sweet little crate called harsh, which perfectly fits the bill. It creates a hash id from an input string.

To use harsh, we add it to the dependency section of our Cargo.toml:

harsh = "0.1.2"

Next, we add the crate to the top of to our main.rs:

extern crate harsh;

Let's create a new file named src/shortener.rs and write the following:

use harsh::{Harsh, HarshBuilder};

pub struct Shortener {
    id: u64,
    generator: Harsh,
}

impl Shortener {
    pub fn new() -> Shortener {
        let harsh = HarshBuilder::new().init().unwrap();
        Shortener {
            id: 0,
            generator: harsh,
        }
    }

    pub fn next_id(&mut self) -> String {
        let hashed = self.generator.encode(&[self.id]).unwrap();
        self.id += 1;
        hashed
    }
}

With use harsh::{Harsh, HarshBuilder}; we bring the required structs into scope. Then we define our own Shortener struct, which wraps Harsh. It has two fields: id stores the next id for shortening. (Since there won't be any negative ids, we use an unsigned integer for that.) The other field is the generator itself, for which we use Harsh. Using the HarshBuilder you can do a lot of fancy stuff, like setting a custom alphabet for the ids. We're good for now, but for more info, check out the official docs. With next_id we retrieve a new String id for our URLs.

As you can see, we don't pass the URL to next_id. That means we actually don't shorten anything. We merely create a short, unique ID. That's because most hashing algorithms produce fairly long URLs and having short URLs is kind of the whole idea.

Wiring it up

So we are done with our shortener and the repository. We need to adjust our src/main.rs again to make use of the two.

This is the point where it gets a little hairy.

I have to admit that I struggled a bit here. Mainly because I was not used to multi-threaded request handling. In Python or PHP you don't need to think about shared-mutable access.

Initially I had the following code in my main.rs:

#[get("/<url>")]
fn store(repo: State<Repository>, url: &str) {
    repo.store(url);
}

fn main() {
    rocket::ignite().manage(Repository::new())
                    .mount("/store", routes![store])
                    .launch();
}

State is the built-in way to save data across requests in Rocket. Just tell it what belongs to your application state with manage() and Rocket will automatically inject it into the routes.

But the compiler said no:

error: cannot borrow immutable borrowed content as mutable
  --> src/main.rs
   |
   |     repo.store(url);
   |     ^^^^ cannot borrow as mutable

In hindsight it all makes sense: What would happen if two requests wanted to modify our repository at the same time? Rust prevented a race condition here! Yikes. Admittedly, the error message could have been a bit more user-friendly, though.

Fortunately, Sergio Benitez (the creator of Rocket) helped me out on the Rocket IRC channel (thanks again!). The solution was to put the repository behind a Mutex.

Here is our src/main.rs in its full glory:

#![feature(plugin, custom_derive)]
#![plugin(rocket_codegen)]

extern crate rocket;
extern crate harsh;

use std::sync::RwLock;
use rocket::State;
use rocket::request::Form;
use rocket::response::Redirect;

mod repository;
mod shortener;
use repository::Repository;

#[derive(FromForm)]
struct Url {
    url: String,
}

#[get("/<id>")]
fn lookup(repo: State<RwLock<Repository>>, id: &str) -> Result<Redirect, &'static str> {
    match repo.read().unwrap().lookup(id) {
        Some(url) => Ok(Redirect::permanent(url)),
        _ => Err("Requested ID was not found.")
    }
}

#[post("/", data = "<url_form>")]
fn shorten(repo: State<RwLock<Repository>>, url_form: Form<Url>) -> Result<String, String> {
    let ref url = url_form.get().url;
    let mut repo = repo.write().unwrap();
    let id = repo.store(&url);
    Ok(id.to_string())
}

fn main() {
    rocket::ignite().manage(RwLock::new(Repository::new()))
                    .mount("/", routes![lookup, shorten])
                    .launch();
}

As you can see we're using a std::sync::RwLock here, to protect our repository from shared mutable access. This type of lock allows any number of readers or at most one writer at the same time. It makes our code a bit harder to read because whenever we want to access our repository, we need to call the read and write methods first.

In our lookup method, you can see that we are returning a Result type now. It has two cases: if we find an id in our repository, we return Ok(Redirect::permanent(url)), which will take care of the redirect. If we can't find the id, we return an Error.

In our shorten method, we switched from a get to a post request. The advantage is, that we don't need to deal with URL encoding. We just create a struct Url and derive FromForm for it, which will handle the deserialization for us. Fancy!

We're done. Let's fire up the service again and try it out!

cargo run

In a new window, we can now store our first URL:

curl --data "url=https://www.endler.dev" https://localhost:8000/

We get some ID back that we can use to retrieve the URL again. In my case, this was gY. Point your browser to https://localhost:8000/gY and you should be redirected to my homepage.

Summary

Rocket provides fantastic documentation and a great community. It really feels like an idiomatic Rustlang web framework.

I hope you had some fun while playing with Rocket.
You can find the full example code on Github.

Thanks for reading! I mostly write about Rust and my (open-source) projects. If you would like to receive future posts automatically, you can subscribe via RSS or email:

Sponsor me on Github My Amazon wish list