Introduction to FTS

Client-Server

The client-server approach is one of the most common techniques of communicating between applications over a network. In this approach, a client application sends a request to a (possibly remote) server application, which performs that requests and returns a matching result/reply. Consider, for example, accessing your bank account through a web browser. Your browser is the client application, communicating with a remote web server sitting at the bank computer. Your operations on your bank account interface are translated into requests sent from your browser to the server. The server processes your requests and replies with results in the form of visual data (web pages) that are displayed to you by your browser.

Middleware Standards

An ongoing effort to standardize the concept of client-server interaction, taking place in the academia and the industry, has resulted in the construction of middlewares: these are specialized layers of software dedicated to hiding all the inter-application (and inter-machine) communication protocols under a simple method call from a client application to a server interface, which can be located anywhere in the network. A few examples of such middleware standards are: CORBA (OMG), J2EE (Sun Microsystems), .NET (Microsoft), Web Services (W3C). Most of these standards aim to create distributed applications as collections of distributed object that communicate regardles of the operating environment (OS/language/..) in which each object was deployed.

Handling Server Failures

One inherent drawback of the client-server technique is that a client depends on the server to continue functioning. As more clients depend on the same server, the server downtime becomes more costly. To reduce this effect, several approaches were taken, all of which can be regarded as different kinds of backup techniques:

Cold backup: The server data is periodically saved to a backup copy. When the server fails, it is restarted from the most up-to-date backup. This method is also know as checkpointing.
Warm/Hot backup: The primary server data is constantly propagated to backup servers. When the primary server fails, one of the backup servers becomes primary after a short while of possibly catching up with the primary's saved updates.
The above methods are considered passive replication methods, since the backup mechanism does not actively serve clients. Hence the difference from the last method:
Active replication: Each server holds a replica of the same data. Each server actively serves clients. As the clients send requests to update the data, the servers synchronize the updates among themselves to remain consistent with each other. When one of the servers fail, its clients are immediately redirected to the other servers.
This last method has obvious benefits: very short downtime and sharing the client load over multiple servers. However, maintaining the illusion of data consistency, at least to the client, is non-trivial in both implementation and cost. Integrating one or more of the above methods into a distributed service reduces its downtime from the client point of view, thus making it highly available to clients. The service can now tolerate a failure of one or more servers - it is made fault tolerant.

Group Communication

Basically, a group communication (GC) toolkit is a software layer that makes guarantees for the transfer of messages between members of a process group. The processes (applications) in the process group send messages using the toolkit, either point to point or to the entire group (multicast). These processes also share a common knowledge of the view - the list of all the group members that are still functioning. This view is updated whenever at least one of the members is suspected to be faulty, is leaving or is joining the group. Whenever that happens, the "non-faulty" group members need to agree upon and install a new view of the group. One of the most fundamental attributes of GC is the reliability of communication - a message that is sent by a process during one view is bound to be received by all the processes that install the next view with the sender. GC further features total ordering of messages - all messages arrive in the same order at all the recipients. This property is essential for data updates, since updates have to be applied in the same order in all the replicas of the data in the participating servers so that the replicas remain consistent.

FTS

FTS (Fault-Tolerance Service) is a CORBA-based infrastructure that can render a CORBA application service fault-tolerant by means of active replication. FTS employs groups commuication for synchronizing client updates between its servers.
FTS's main features are: