Computer
Science Department
Technion
Communication Lab
Implementation of Proxy server with efficient
cache removal algorithm
Final Report
Vladislav Kalinovsky c1162495@t2.technion.ac.il
Or Gerlitz s2895834@t2.technion.ac.il
Mirkin Lidia c1170451@t2.technion.ac.il
Tables of contens
1.Introduction
2.Protocols and algorithms
3. Proxy server work diagram
4. Modules diagram
5. Classes description
Introduction
Today networks become more and more over-loaded. Often,
the time needed to retrieve data might make the response time for the client's
requests too long. Proxy server, called also forward server, is primarily
used to reduce the load on the network and to make the load time shorter. This
goal is reached by the maintenance of a large cache, managed by an efficient
algorithm. This lab project is focused on the implementation of the HYB
cache removal algorithm.
The Proxy server uses the multi-thread approach to deal with multiple clients
at once. This approach has many advantages vs the traditional 'fork()'
one.
Protocols and Algorithms
Internet
Protocol
- The server works according to the http protocol.
The server uses it's own HTTP parser, which can handle properly HTTP/2.0
and the previous ones.
- For more details on the Http protocol look at Http
resource.
Communication
Protocol
- The communication between the clients and the Proxy Server,
as well the one between the server and the remote host is performed by
TCP/IP protocol using BSD sockets.
- For more details on the protocol look at BSD
sockets .
Server Application
Model
Cache Policy
- According to result of above publications, the Hybrid
removal algorithm (below noted as HYB ) was chosen for implementation.
The estimation is done in a manner similar to RTT estimation in TCP . For
each server j, the proxy maintains an estimated latency (time)
to open a connection to the server, denoted clatj , and
estimated bandwidth of the connection (in bytes/second), denoted cbwj
. Whenever a new document is received from the server, the connection
establishment latency and bandwidth for that
document are measured (let these samples be denoted by sclat
and scbw , respectively), the estimates are updated as
follows:
clatj = (1-ALPHA)
clatj + ALPHA sclat
cbwj = (1-ALPHA)
cbwj + ALPHA scbw .
where ALPHA is a smoothing constant, set to 1/8 as it is in the TCP smoothed
estimation of round trip time.
- HYB is a hybrid of several factors, considering
not only download time but also number of references to a
document and document size.
Let ser(i) denote the server on which document i resides.
HYB selects for replacement the document i with the lowest value
value of the following expression:
W(clat ser(i)
, cbwser(i) , nrefi , si ) = (clatser(i)
+ W B /cbw ser(i) )(nrefi **WN )/
si
where nrefi is the number
of references to document i since it last entered the cache, si
is the size in bytes of document i, and WB and
WN are constants that set the relative importance
of the variables cbwser(i) and nrefj
, respectively. Therefore a document is not likely to be removed if HYB
is large, which would occur if the document resides on a server that takes
a long time to connect to (i.e., large clatser(i) ) and
is connected via a low bandwidth link (i.e., small cbwser(i)
), if the document has been referenced many times in the past (i.e., large
nrefi ), and the document size is small (e.g., small
si ). As for the weights, as WN
approaches zero document size (si) becomes more important
than number of references (nrefi ). As WB
is increased the importance of clat relative to cbw is increased.
Note that a proxy runs at the application layer of a network protocol stack,
and therefore would not be able to obtain the connection latency samples
s clat . Therefore the following heuristic is used to
estimate connection latency. A constant CONN is chosen (e.g., 2Kbytes).
Every document that the proxy receives whose size is less than CONN is
used as an estimate of
connection latency s clat . Every document whose size
exceeds CONN is used to as a bandwidth sample estimation .
- The following code extract illustrates the actual calculation
method.
#define BYTES-CONNECT-TIME-ONLY 2048
#define SMOOTH-COEF 0.125
/* int bytes - the number of bytes in the URL file */
/* int time - the milliseconds to download the URL file */
/* a separate msec-connect and CBW is kept for each server name
* this code has been simplified for readability */
if (bytes < BYTES-CONNECT-TIME-ONLY)
msec-connect += SMOOTH-COEF * (time - msec-connect);
else
/* more than just a small connect time only packet */
/* the multiplier of 1000 is required because the time is in
milliseconds, the 1000 converts this to seconds */
CBW += SMOOTH-COEF * (((1000 * (bytes - BYTES-CONNECT-TIME-ONLY)) /(time - msec-connect)) - CBW);
Proxy server
work diagram:
Modules Diagram
Classes description
The application had been implemented in the C++ language.The
three main Libraries that were used are:
- BSD sockets - communication
- Solaris threads - workers concurrency
- LEDA - different data structures (Queue,List,String etc.)
used by the cache and cache algorithm
For each class a short description is given followed by
listing of the public functions which actually make the interface.
ProxyServer
- Class name:
ProxyServer
- Description: Listens
for clients connecting to it via some fixed port. For each client the proxy
lets a new Worker deal with it's requests.
- Methods:
- acceptReqeust - The server runs in an infinite loop,
on each iteration it waits until a new client connection is accepted,
then it allocates new Worker and creates new thread to run it
- Fields:
- int serverListenPort - the port where all the clients
connect to
The following files contain source code of this class:
Signal Handler
- Module name:
signal_handler
- Description: Uses
various system and thread library calls to implement handling signals sent
to both the workers and the main thread. Enables asynchronous termination
and de-allocation of worker's resources(buffers, file descriptors etc.)
after a client closed it's socket or crashed.
- Methods:
- init_proxy_specific_key - init a special key used by
the system to supply each thread a unique storage location to be used when
handling a signal.
- set_abort_handler - is called by each new thread before
it starts to run the Worker. The thread gets a unique location where the
system stores the Worker address. The thread tells the system which signals
it wants to catch, and the name of the function to be called with the signal
argument.
- terminate_worker - called by the system to deliver a
signal.
If the current thread is one that executes some Worker, it gets the Worker
address from the system using it's specific key and then deletes the Worker
(the Worker's destructor frees the buffers and closes file descriptors). Once
all this is done the thread exits.
If the main thread caught a signal it 'flashes' it's stack and starts it's
listening loop again. This scheme is implemented using the 'setjmp' and
'longjmp' system calls.
The following files contain source code of this class:
Worker
- Class name:
Worker
- Description: Connected
both to the client and to the remote http server the worker handles the
data transmission.
- Methods:
- run - the main Worker's routine. Has the following steps
(calls to other private functions)
- read the client request and parse it
- check if the requested page is in the cache (if found
send it to the client and goto step 5)
- if the page is not in the cache, open connection to the
remote host send the header to the client and then do receive/send until
all the page was received
- store the page in the cache
- return
- Fields:
- httpRequest *parsedRequest - used to parse the client's
request and build request to remote server
- httpResponse *parsedResponse - used to parse the remote
server's response (header)
- Timer timer - used to measure timings for the cache algorithm
The following files contain source code of this class:
Parser
- Class
name: httpMsg
- Description: abstract
class, intended to make parsing to request or response according to dynamic
type
- Methods:
- httpMsg - gets string and its size
and calls the functions those make parsing.
- Fields:
- gh - stores general part of the header that is the same
for request and response
- eh - stores entity part of the header that is the same
for request and response
- Class name: httpRequest
- Description: inherits
from httpMsg class, and implements the deferred functions that specifically
intended to parsing HTTP request.
- Methods:
- httpRequest - gets string and its size and
calls the functions those make parsing of the request header.
- Fields:
- rl - stores parsed request line of the request
- rh - stores message header
- Class name: httpResponse
- Description: inherits
from httpMsg class, and implements the deferred functions that specifically
intended to parsing HTTP response.
- Methods:
- httpResponse - gets string and its size and
calls the
- functions those make parsing of the response header.
- Fields:
- rl - stores parsed response
line of the response
- rh - stores message header
The following files contain source code of this class:
Cache
- Class name: Cache
- Description: implements
cache data structure
- Methods
- findData - gets as parameters server name and file absolute
path on the server and return file if it in the cache
- storeData - gets as parameters
server name, file absolute path on the server, size of the file, receiving
time and file and try store it in the cache according to algorithm value
of the file.
- Fields:
- cachePath - path of cached files
- cacheAlgo - cache algorithm
- currDiskSize - size of currently stored files
- sizeOfCache - size of Cache
- serverTable - dictionary of stored servers information
- pagesTable - dictionary of store pages information
The following files contain source code of this class:
Algorithm
- Class name: Algorithm
- Description: implements
HYB cache removal algorithm
- Methods:
- getCandidates - gets as parameters size of the file we
want insert and returns list of the files to be remove from the cache
- getAlgorithmValue - returns the value of algorithm for
given file and server
The following files contain source code of this
class:
Timer
- Class name: Timer
- Description: works
as vanila timer watch
- Methods:
- start_timer - start the timing
- get_timer - returns time from the last start_timer
call until call to this functions
The following files contain source code of this class:
Structs
- Class name:
ServerInfo
- Description: stores
and manages a server information
- Methods:
- estimateNew - gets the size of block and transferring
time and estimates new connection latency and band width to the server.
- Fields:
- conLatency - time to connect to the server
- conBandWidth - average transfer speed to the server
- Class name: IndexInfo
- Description: stores
and manages a file information
- Methods:
- addRef - increments the file reference number
- Fields:
- serverName - name of the server which the file was received
from
- fileAbsoluteFile - name of the file
- numOfRef - number of references to the file
- pageSize - size of the file
The following files contain source code of this class:
Demo
- Class name: Demo
- Description: stores
and manages a demo file information
- Methods:
- beginDemo - open demo file and write start information
- endDemo - close demo file
- writeDemoAll - gets from cache server table and page
table and stores information into demo file
- writeDemoLine - gets from cache information on single
page and store it in demo file
- Fields:
The following files contain source code of this class: