A Bittorrent Tracker Written in Haskell

Several years ago, I wanted to write something in Haskell that was a bit bigger than the toys that I had written. I wanted something that did some non trivial IO and some amount of logic. I decided I would write a BitTorrent tracker. You can find it with my most recent updates on github

What is a tracker

BitTorrent is still in use, but it isn’t in as popular and isn’t in the news nearly as much. BitTorrent is a peer to peer file sharing protocol that sends a file in chunks, and people connect to peers to get the chunks. It’s common for ISO’s to have torrent links in addition to HTTP downloads. The purpose of a tracker is to help peers find each other. The tracker keeps an associative map between hosts and torrent hashes. It’s important that the tracker only has the hash of a torrent file. They don’t have the file and they don’t know what is in it.

Tracker Protocol

The tracker protocol is essentially a small web service. It doesn’t follow REST principles. There are 2 types of @GET@ requests: @announce@ and @scrape@. An announce request registers a status update of a host, peer, that is interested in a given torrent hash. A peer makes an announce request when it starts downloading, when it completes downloading, and if it is no longer participating in a torrent. In response, the tracker returns a list of peers that are requesting the same torrent hash. Scrape requests ask for information about how many different peers are interested in a given torrent hash, how many are actively downloading, how many are seeding the torrent.

UDP Protocol

The UDP protocol is a simple request response protocol with a trick to prevent amplification. First a client must make a connection, and the server returns a connection id. The connection id must be supplied along with any further connection ids. The standard requests that a tracker answers all have corresponding binary UDP requests. They use a standard trick of using the same header for all requests, including connection requests, and then allowing the suffix to vary based on the type of request.

An Abstract Server

The core server is written to take algebraic data types to algebraic data types. For example the core server functions have the types

handleAnnounce :: AnnounceRequest -> AnnounceT AnnounceResponse
handleScrape :: IpVersion -> [ScrapeRequest] -> AnnounceT [ScrapeResponse]

where AnnounceT wraps some state and IO. We pass the IpVersion to handleScrape because we treat Ipv4 and Ipv6 as completely separate trackers.

This made implementing both the UDP Protocol and the web protocol straightforward. Most of the work is parsing and serializing the binary protocol and only a small amount of logic is specific to either HTTP or UDP.

A couple of innovations

BitTorrent has a system of encoding callled Bencoding. I implemented a type class for serializing to the encoding. This means that we don’t have to create an intermediate value before serializing, we can serialize directly. In order to make it easier to test, I also implemented the intermediate value, and I made it implement the typeclass. This means that you can write a serialization instance, and it doesn’t need to go through an intermediate value, but you can take the same code and have it produce an intermediate value if you want to verify that it is serializing correctly.

Another thing that I did was to use a church encoded EitherT type in the web portion. The type has two continuations, one for success and the other for failure. For chained computations this avoids re-evaluating the tags on an Either value. Yes, this was premature optimization. This is a learning project, and this was interesting to me.

Lessons

Overall, this was a really useful exercise. I learned a lot about Data.Binary There are other libraries that are faster for binary serialization and deserialization, but they all work similarly. I got to use type families and GADTs for the Bencoding. I solved a real problem that needed to do real IO including sockets. The ByteString socket library is really nice, and Haskell’s green threading and asynchronous IO make using sockets look like C, but cleaned up. Looking at the code years later, I am still pretty happy with the code. To be production ready it would need more work, with things like logging and config files, but I am still pretty happy with it.