This section describes possible design of the manager module. The manager lives somewhere on the net and controls the HTTP / RSMC servers operation. The manager can operate in a single instance or multiple instances (distributed manager). A distributed manager will prevent a bottleneck in cases of large network load. The clients (browsers) send their HTTP requests to the manager, which reroute the clients to the appropriate server. This causes each request to be rerouted exactly once.

So we add network overhead (double request), why not request the url from one of the servers ?
Statistically, the chances that the requested url will be found in a specific server are low. Suppose we have 10 servers that each of them holds 20% of the files (some files are mirrored). The chances that the requested url will be found on a specific server is 1:5. If the specific server contains the requested url, we performed only one request. If not, we spent three requests (we need to reroute the request to a manager which will reroute it again to the appropriate server). In simple math the average of the requests per url is:

1/5 * 1 + 4/5 * 3 = 2.6

So, we better ask the manager first and then request the url from the correct server.

The manager communicates with the servers through predefined API's collection called RSMC (Remote Server Management Commands). The RSMC requests are sent to the servers in RSMP (Remote Server Management Protocol). the RSMC API contains methods that allow the manager to query information from the servers and transfer files between the servers. The whole system can operate as a distributed proxy server or as a distributed web server. In either case, the manager must know exactly which files exist in each server. In case of distributed managers, this information can be common to all managers (each manager contains all the information) or splitted between them. Splitting the information can be useful on large models where there are many files on each server. For example, the files information can be splitted between two managers: the first holds information about files in A-L interval and the second holds information about files in M-Z interval. A more efficient way is to hold the files information in a way that equalize the load between the managers.

The manager should decide which files would be in each server. If the system operates as a distributed web server, the manager must save at least one copy of each file (better to have more copies to prevent loss of data due to a server failure), so moving files between the servers must be handled with care. In web proxy system, the manager uses the servers as a caching space and saves only the most needed files.

The most needed files are identified using a caching algorithm. It can be either simple LRU or more complex algorithm. We recommend an algorithm that considers the following aspects:

last recently used - High chances that it will be used again soon.

most recently used - High chances that it will be used again.

distance from server - Files from close servers load faster. So we can load them directly from the remote server. Better to cache remote files.

file size - Saving smaller files allow more files to be saved.

file types - The most requested files are html files. Html files are also small relative to image files. So it would be better to save html files (or any other common file types) instead of rare file types. Also, when an html file is in cache, it will load quickly and allow the user to read the text while the other components are being loaded.
 

The manager manages all file transfers between the servers and should always have an updated picture of which files are on each server.
The manager should hold a file table which contains the files in each server. The manager should update this table after each request is acknowleged.