The distribution server will
allow the content distributor to encode the content and upload it to a
predefined group of servers, which will be specified as a configuration file.
Each chunk of the encoded data
will be stored in a separate file, since this will make the download manager
implementation simpler. Also not all servers allow downloading from an arbitrary
position in the file (e.g. some proxy servers don’t support it). The names of
the encoded files will be determined by the application based on the original
file name. As mentioned above, there is a limit of 8 servers at this moment.
The distribution application
will also prepare the HTML page that contains the client applet, which will
manage the download process on the client machine. The HTML file will also have
an embedded data regarding the locations of the data (the list of the mirror
sites). This data will be passed to the applet when the applet is activated.
Note: the encoding procedure is
done off-line.
The download manager
The user can start the file
download either by following a link from a web page or by entering the URL of
the download manager (the HTML page). When the user initializes the download
process (by either method) it receives the HTML page, which was prepared
(customized) by the distribution application for this particular file.
The HTML page contains a Java
applet, which is the download manager and is described below.
First of all, the Java applet
will parse the HTML page and extract the locations of the mirror sites. Then, we
open HTTP connections to all the mirror sites, and start downloading the first
chunk of data from all mirror sites in parallel. Now we wait until the download
of the first chunk finishes, and then we start downloading the second chunk and
decoding the first chunk in parallel. When the last chunk finishes downloading
the user must wait until the decoding of the last chunk is over. This is the
only time when the user actually feels the price of the decoding process. We try
to minimize this time by choosing a smaller chunk size, although it is
negligible when using fast CPUs and JVMs that support Just In Time compiling (JIT).
When the process finishes the
downloaded file will be saved to the disk
How a chunk is being downloaded?
We receive the data from all the
servers and as soon as we have enough data for the chunk we abort all existing
connections. In order to avoid unnecessary packets to be sent over the network,
we close all but the fastest connections when the amount of data we have already
received is close enough to the chunk size. We close the last connection as soon
as the last byte needed for the decoding arrives. However due to the TCP
limitations more unnecessary data can be received (due to a large window size or
fast network).
Since the opening of anew
connection to a remote server (the three way handshake and sending and
processing the request header), we open the connections for the next chunk
before the current chunk is completely downloaded .The amount of data sent
during the setup is not big and we save precious time which otherwise would be
wasted.
If a connection to one of the
servers fails, we try to open another connection to that server, and resume
downloading from the position we stopped. If resume is not available, we might
consider starting from the beginning, or give up using this server for the
current chunk, depending on the amount of data we have already downloaded from
this server, and the download progress on other connections.
The data format
Our basic unit of work is a packet
(1KB). A strip is a sequence of packets, in our case, 32 packets.
Therefore, the size of a strip is 32Kb. 32 strips are combined to form a chunk
(1Mb).
Strips are the basic units of
the encoding/decoding process, while chunks are the basic units of the download
process. We store each chunk of encoded data in a separate file.
According to our FEC algorithm,
each server must hold a complete (encoded) copy of data. Therefore the total
size of the encoded data is the size of the original data multiplied by the
number of the data servers.
Each encoded chunk of data is an
interleaved sequence of encoded strips.
Suppose our chunk has only four
strips, then the encoding procedure can be described as below:
The format of original data
chunk: