Several years ago, I wanted to write something in Haskell that was a bit bigger than the toys that I had written. I wanted something that did some non trivial IO and some amount of logic. I decided I would write a BitTorrent tracker. You can find it with my most recent updates on github
BitTorrent is still in use, but it isn’t in as popular and isn’t in the news nearly as much. BitTorrent is a peer to peer file sharing protocol that sends a file in chunks, and people connect to peers to get the chunks. It’s common for ISO’s to have torrent links in addition to HTTP downloads. The purpose of a tracker is to help peers find each other. The tracker keeps an associative map between hosts and torrent hashes. It’s important that the tracker only has the hash of a torrent file. They don’t have the file and they don’t know what is in it.
The tracker protocol is essentially a small web service. It doesn’t follow REST principles. There are 2 types of @GET@ requests: @announce@ and @scrape@. An announce request registers a status update of a host, peer, that is interested in a given torrent hash. A peer makes an announce request when it starts downloading, when it completes downloading, and if it is no longer participating in a torrent. In response, the tracker returns a list of peers that are requesting the same torrent hash. Scrape requests ask for information about how many different peers are interested in a given torrent hash, how many are actively downloading, how many are seeding the torrent.
The UDP protocol is a simple request response protocol with a trick to prevent amplification. First a client must make a connection, and the server returns a connection id. The connection id must be supplied along with any further connection ids. The standard requests that a tracker answers all have corresponding binary UDP requests. They use a standard trick of using the same header for all requests, including connection requests, and then allowing the suffix to vary based on the type of request.
The core server is written to take algebraic data types to algebraic data types. For example the core server functions have the types
handleAnnounce :: AnnounceRequest -> AnnounceT AnnounceResponse
handleScrape :: IpVersion -> [ScrapeRequest] -> AnnounceT [ScrapeResponse]
where AnnounceT
wraps some state and IO. We pass the
IpVersion
to handleScrape because we treat Ipv4 and Ipv6 as
completely separate trackers.
This made implementing both the UDP Protocol and the web protocol straightforward. Most of the work is parsing and serializing the binary protocol and only a small amount of logic is specific to either HTTP or UDP.
BitTorrent has a system of encoding callled Bencoding. I implemented a type class for serializing to the encoding. This means that we don’t have to create an intermediate value before serializing, we can serialize directly. In order to make it easier to test, I also implemented the intermediate value, and I made it implement the typeclass. This means that you can write a serialization instance, and it doesn’t need to go through an intermediate value, but you can take the same code and have it produce an intermediate value if you want to verify that it is serializing correctly.
Another thing that I did was to use a church encoded
EitherT
type in the web portion. The type has two
continuations, one for success and the other for failure. For chained
computations this avoids re-evaluating the tags on an
Either
value. Yes, this was premature optimization. This is
a learning project, and this was interesting to me.
Overall, this was a really useful exercise. I learned a lot about
Data.Binary
There are other libraries that are faster for
binary serialization and deserialization, but they all work similarly. I
got to use type families and GADTs for the Bencoding. I solved a real
problem that needed to do real IO including sockets. The
ByteString socket library is really nice, and Haskell’s green
threading and asynchronous IO make using sockets look like C, but
cleaned up. Looking at the code years later, I am still pretty happy
with the code. To be production ready it would need more work, with
things like logging and config files, but I am still pretty happy with
it.