fastcat - A Faster `cat` Implementation Using Splice
Lots of people asked me to write another piece about the internals of well-known Unix commands. Well, actually, nobody asked, but it makes for a good intro. I’m sure you’ve read the previous parts about yes
and ls
— they are epic.
Anyway, today we talk about cat
, which is used to concatenate files - or, more commonly, abused to print a file’s contents to the screen.
# Concatenate files, the intended purpose
# Print file to screen, the most common use-case
Implementing cat
Here’s a naive cat
in Ruby:
#!/usr/bin/env ruby
args.each do
IO.foreach(arg) do
puts line
end
end
end
cat(ARGV)
This program goes through each file and prints its contents line by line. Easy peasy! But wait, how fast is this tool?
I quickly created a random 2 GB file for the benchmark.
Let’s compare the speed of our naive implementation with the system one using the awesome pv (Pipe Viewer) tool. All tests are averaged over five runs on a warm cache (file in memory).
# Ruby 2.5.1
> ./rubycat myfile | pv -r > /dev/null
[196MiB/s]
Not bad, I guess? How does it compare with my system’s cat?
cat myfile | pv -r > /dev/null
[1.90GiB/s]
Uh oh, GNU cat is ten times faster than our little Ruby cat. 💎🐈🐌
Making our Ruby cat a little faster
Our naive Ruby code can be tweaked a bit. Turns out line buffering hurts performance in the end1:
#!/usr/bin/env ruby
args.each do
IO.copy_stream(arg, STDOUT)
end
end
cat(ARGV)
rubycat myfile | pv -r > /dev/null
[1.81GiB/s]
Wow… we didn’t really try hard, and we’re already approaching the speed of a tool that gets optimized since 1971. 🎉
But before we celebrate too much, let’s see if we can go even faster.
Splice
What initially motivated me to write about cat
was this comment by user wahern on Hacker News:
I’m surprised that neither GNU yes nor GNU cat uses splice(2).
Could this splice thing make printing files even faster? — I was intrigued.
Splice was first introduced to the Linux Kernel in 2006, and there is a nice summary from Linus Torvalds himself, but I prefer the description from the manpage:
splice() moves data between two file descriptors without copying between kernel address space and user address space. It transfers up to
len
bytes of data from the file descriptorfd_in
to the file descriptorfd_out
, where one of the file descriptors must refer to a pipe.
If you really want to dig deeper, here’s the corresponding source code from the Linux Kernel, but we don’t need to know all the nitty-gritty details for now. Instead, we can just inspect the header from the C implementation:
ssize_t ;
To break it down even more, here’s how we would copy the entire src
file to dst
:
const ssize_t r = ;
The cool thing about this is that all of it happens inside the Linux kernel, which means we won’t copy a single byte to userspace (where our program runs). Ideally, splice works by remapping pages and does not actually copy any data, which may improve I/O performance (reference).
Using splice from Rust
I have to say I’m not a C programmer and I prefer Rust because it offers a safer interface. Here’s the same thing in Rust:
Now, I didn’t implement the Linux bindings myself. Instead, I just used a library called nix, which provides Rust friendly bindings to *nix APIs.
There is one caveat, though: We cannot really copy the file directly to standard out, because splice requires one file descriptor to be a pipe. The way around that is to create a pipe, which consists of a reader and a writer (rd
and wr
). We pipe the file into the writer, and then we read from the pipe and push the data to stdout.
You can see that I use a relatively big buffer of 16384 bytes (214) to improve performance.
extern crate nix;
use env;
use File;
use io;
use AsRawFd;
use ;
use pipe;
const BUF_SIZE: usize = 16384;
So, how fast is this?
fcat myfile | pv -r > /dev/null
[5.90GiB/s]
Holy guacamole. That’s over three times as fast as system cat.
Operating System support
- Linux and Android are fully supported.
- OpenBSD also has some sort of splice implementation called
sosplice
. I didn’t test that, though. - On macOS, the closest thing to splice is its bigger brother, sendfile, which can send a file to a socket within the Kernel. Unfortunately, it does not support sending from file to file.2 There’s also
copyfile
, which has a similar interface, but unfortunately, it is not zero-copy. (I thought so in the beginning, but I was wrong.) - Windows doesn’t provide zero-copy file-to-file transfer (only file-to-socket transfer using the TransmitFile API).
Nevertheless, in a production-grade implementation, the splice support could be activated on systems that support it, while using a generic implementation as a fallback.
Nice, but why on earth would I want that?
I have no idea. Probably you don’t, because your bottleneck is somewhere else. That said, many people use cat
for piping data into another process like
# Count all lines in C files
|
or
|
In this case, if you notice that cat
is the bottleneck try fcat
(but first, try to avoid cat
altogether).
With some more work, fcat
could also be used to directly route packets from one network card to another, similar to netcat.
Lessons learned
- The closer we get to bare metal, the more our hard-won abstractions fall apart, and we are back to low-level systems programming.
- Apart from a fast cat, there’s also a use-case for a slow cat: old computers. For that purpose, there’s — you guessed it — slowcat.
That said, I still have no idea why GNU cat does not use splice on Linux. 🤔 The source code for fcat is on Github. Contributions welcome!
Footnotes
1. Thanks to reader Freeky for making this code more idiomatic.↩
2. Thanks to reader masklinn for the hint.↩
- 💬 Comments on Reddit, Lobsters, Hacker News.
Thanks for reading! I mostly write about Rust and my (open-source) projects. If you would like to receive future posts automatically, you can subscribe via RSS.
Sponsor me on Github My Amazon wish list
Thanks to Olaf Gladis, Patrick Pokatilo, Simon Brüggen for reviewing drafts of this article.