Olga Brukman (CS, Technion)
Wednesday, 23.11.2011, 11:30
This talk introduces theoretical foundations for system architectures and algorithms for creating truly robust autonomic systems -- systems that are able to recover automatically from unexpected failures. We consider various settings of system transparency. We consider black box and transparent box software packages. The general assumption is that a software package fails when it encounters an unexpected environment state -- a state the package was not programmed to cope with. Creating a system that anticipates every possible environment state is not feasible due to the size of the environment. Thus, an autonomic system design should imply that a system is able to overcome an unexpected environment state either by executing a recovery action that restores a legal state or by finding a new program that respects the specifications and achieves the software package goals in the current environment.
In the first part of this talk, we consider software packages to be black boxes. We propose modeling software package flaws (bugs) by assuming eventual Byzantine behavior of the package. A general, yet practical, framework and paradigm for the monitoring and recovery of systems called autonomic recoverer is proposed. In the second part we consider a software package to be a transparent box and introduce the recovery oriented programming paradigm. Programs designed according to the recovery oriented programming paradigm include important safety and liveness properties and recovery actions as an integral part of the program. We design a pre-compiler that produces augmented code for monitoring the properties and executing the recovery actions upon a property violation. Finally, in the third part, we consider a highly dynamic environment, which typically implies that there are no realizable specifications for the environment, i.e., there does not exist a program that respects the specifications for every given environment. We suggest searching for a program in run time by trying all possible programs on environment replicas in parallel. We design control search algorithms that exploit various environment properties.