Online content providers are facing an explosion in demand for data as more and more people turn to the Internet for dynamic and personalized information. As a result, the search for technologies to optimize the use of network bandwidth has become a vital part of many content distribution strategies. Peer-to-peer protocols like BitTorrent are being adopted by content providers to distribute data more efficiently without increasing network bandwidth costs.
BitTorrent is a peer-to-peer file distribution protocol written in Python. Its source code is covered by an open source MIT license. It was developed over the course of 2 years by Bram Cohen and subsequently released at the 2002 CodeCon conference in San Francisco. The basic goal of BitTorrent is to enable the primary content source to distribute one or more static files to a large number of concurrent downloaders (clients) without incurring massive network loads. It does so by using the unused upload bandwidth of each client, making it possible for clients to share parts of the source file with each other.
An Example
As a demonstration of BitTorrent in action, let’s walk through the process of downloading a file. I assume you have downloaded and installed a BitTorrent client program on your system at this point. For the purposes of this example, we will download an ISO CD image (fedora-cd1.iso) for the Fedora Linux distribution.
In order for us to use BitTorrent to download, the file’s distributor has to place a static file with the ‘.torrent’ extension on its Web server. Note that the Web server should be configured in such a way that files with the .torrent extension are associated with the ‘application/x-bittorrent‘ mimetype. The meta data (.torrent) file can be linked to any other file available on the Web. In our example the link would be http://www.somesite.com/fedora-cd1.iso.torrent.
The meta data file (fedora-cd1.iso.torrent) contains information describing the actual source file (fedora-cd1.iso), and the URL for the “tracker” (see Figure 1). Specifically, the meta information fields are file name, file length, individual piece length, hash codes for each piece and DNS name or IP address of the tracker. The tracker plays an important role by helping clients find each other. It does so by providing each client with a random list of other peer clients, using a simplified protocol on top of HTTP. Clients report their progress to the tracker periodically during the download process, but the tracker has no knowledge of the contents of the file being distributed.
BitTorrent does not centrally manage resource allocation. Instead, each client attempts to maximise its download rate by controlling various protocol parameters. Clients make direct connections (using ports 6881-6889 by default) to one or more of the clients in the list, to exchange parts of the file. Direct connections between clients are duplex (bi-directional), and every client tries to maintain the greatest number of active connections. A client’s refusal to upload temporarily is known as choking. Connections are choked to prevent leeching a situation where another client is downloading, but not uploading.
To maximize the number of duplex connections, clients reward each other by reciprocating uploads. So clients unwilling to upload will find their download rate dropping as other clients choke in response. Clients decide which connections to choke or unchoke by calculating the current download rate of each connection, once every ten seconds. The connection is left choked or unchoked until the next ten-second period is up. This fixed interval cycle prevents clients from rapidly choking and unchoking, causing network resources to be wasted. Finally, a client does an “optimistic” unchoke, once every 30 seconds, to try out unused connections to determine if they might offer better transfer rates than current ones.
To start off the download process, a client who already possesses the entire file introduces it into the network by supplying other clients joining the torrent with random pieces of the source file. This client is referred to as the “seed.” The source file is split into fixed-size pieces (named so that clients can keep track of which parts of the file they have or don t have). Hash codes (SHA1) for each piece are provided to clients via the meta data (.torrent) file. A successful download of an individual piece can only be reported by a client after the corresponding hash code has been verified locally. To further optimize data transfer, delays between individual piece transfers are avoided by splitting each fixed-size piece into smaller fixed-size sub-pieces. This enables several sub-piece requests to be pipelined at once, keeping the TCP connection saturated. The order in which a client selects a piece to download, plays an important role in download performance. Initially, when download begins, clients request pieces that they need, in random order. This reduces the total download time for a particular piece because it allows each client to download sub-pieces from multiple clients as opposed to everyone trying to get the same piece from a single client. After the first piece has been successfully downloaded by a client, the piece selection order changes to what is known as “rarest first.” Using this method clients request pieces that are possessed by the least number of peer clients. “Rarest first” performs well by making sure that clients have pieces, which all the other peers want, and reducing the chances of a peer not having anything to upload.
After you finish downloading the file, you should allow the client to continue uploading for the benefit of other downloaders. After all, it is this idea of sharing that makes BitTorrent so successful.
BitTorrent is quickly becoming a viable option for many content providers. Recently, Lindows.com (now known as Linspire) announced a 50 percent discount for customers who download their copy of Lindows OS using BitTorrent. Not only does this technology have a future in computing but it is also enabling organizations in the fields of bioinformatics and high-energy physics to efficiently share large amounts of data amongst an ever-growing community of researchers worldwide. Perhaps future enhancements could include an option for encryption of data streams, enabling organizations to share sensitive data. As an open source project BitTorrent will continue to offer compelling features to meet the challenges of large-scale data distribution.
You can find out more about BitTorrent on the Web at www.bittorrent.com.