fastcat - A Faster `cat` Implementation Using Splice
Lots of people asked me to write another piece about the internals of well-known Unix commands. Well, actually, nobody asked, but it makes for a good intro. I'm sure you’ve read the previous parts about
ls — they are epic.
Anyway, today we talk about
cat, which is used to concatenate files - or, more commonly, abused to print a file's contents to the screen.
# Concatenate files, the intended purpose
# Print file to screen, the most common use-case
Here's a naive
cat in Ruby:
This program goes through each file and prints its contents line by line. Easy peasy! But wait, how fast is this tool?
I quickly created a random 2 GB file for the benchmark.
Let's compare the speed of our naive implementation with the system one using the awesome pv (Pipe Viewer) tool. All tests are averaged over five runs on a warm cache (file in memory).
# Ruby 2.5.1
> ./rubycat myfile | pv -r > /dev/null
Not bad, I guess? How does it compare with my system's cat?
cat myfile | pv -r > /dev/null
Uh oh, GNU cat is ten times faster than our little Ruby cat. 💎🐈🐌
Our naive Ruby code can be tweaked a bit. Turns out line buffering hurts performance in the end1:
rubycat myfile | pv -r > /dev/null
Wow... we didn't really try hard, and we're already approaching the speed of a tool that gets optimized since 1971. 🎉
But before we celebrate too much, let's see if we can go even faster.
What initially motivated me to write about
cat was this comment by user wahern on Hacker News:
I'm surprised that neither GNU yes nor GNU cat uses splice(2).
Could this splice thing make printing files even faster? — I was intrigued.
splice() moves data between two file descriptors without copying between kernel address space and user address space. It transfers up to
lenbytes of data from the file descriptor
fd_into the file descriptor
fd_out, where one of the file descriptors must refer to a pipe.
If you really want to dig deeper, here's the corresponding source code from the Linux Kernel, but we don't need to know all the nitty-gritty details for now. Instead, we can just inspect the header from the C implementation:
To break it down even more, here's how we would copy the entire
src file to
const ssize_t r = ;
The cool thing about this is that all of it happens inside the Linux kernel, which means we won't copy a single byte to userspace (where our program runs). Ideally, splice works by remapping pages and does not actually copy any data, which may improve I/O performance (reference).
I have to say I'm not a C programmer and I prefer Rust because it offers a safer interface. Here's the same thing in Rust:
Now, I didn't implement the Linux bindings myself. Instead, I just used a library called nix, which provides Rust friendly bindings to *nix APIs.
There is one caveat, though: We cannot really copy the file directly to standard out, because splice requires one file descriptor to be a pipe. The way around that is to create a pipe, which consists of a reader and a writer (
wr). We pipe the file into the writer, and then we read from the pipe and push the data to stdout.
You can see that I use a relatively big buffer of 16384 bytes (214) to improve performance.
extern crate nix;
const BUF_SIZE: usize = 16384;
So, how fast is this?
fcat myfile | pv -r > /dev/null
Holy guacamole. That's over three times as fast as system cat.
- Linux and Android are fully supported.
- OpenBSD also has some sort of splice implementation called
sosplice. I didn't test that, though.
- On macOS, the closest thing to splice is its bigger brother, sendfile, which can send a file to a socket within the Kernel. Unfortunately, it does not support sending from file to file.2 There's also
copyfile, which has a similar interface, but unfortunately, it is not zero-copy. (I thought so in the beginning, but I was wrong.)
- Windows doesn't provide zero-copy file-to-file transfer (only file-to-socket transfer using the TransmitFile API).
Nevertheless, in a production-grade implementation, the splice support could be activated on systems that support it, while using a generic implementation as a fallback.
I have no idea. Probably you don't, because your bottleneck is somewhere else. That said, many people use
cat for piping data into another process like
# Count all lines in C files
In this case, if you notice that
cat is the bottleneck try
fcat (but first, try to avoid
With some more work,
fcat could also be used to directly route packets from one network card to another, similar to netcat.
- The closer we get to bare metal, the more our hard-won abstractions fall apart, and we are back to low-level systems programming.
- Apart from a fast cat, there's also a use-case for a slow cat: old computers. For that purpose, there's — you guessed it — slowcat.
That said, I still have no idea why GNU cat does not use splice on Linux. 🤔 The source code for fcat is on Github. Contributions welcome!
Thanks for reading! I mostly write about Rust and my (open-source) projects. If you would like to receive future posts automatically, you can subscribe via RSS or email: