Technical Report MSC-2013-23

Title: Cost Aware Fault Recovery in Clouds
Authors: Assaf Israel
Supervisors: Danny Raz
Abstract: Maintaining high availability of IaaS services at a reasonable cost is a challenging task that received recent attention due to the growing popularity of Cloud computing as a preferred means of affordable IT outsourcing. In large data-centers faults are prone to happen and thus the only reasonable cost-effective method of providing high availability of services is an SLA aware recovery plan; that is, a mapping of the service VMs onto backup machines where they can be executed in case of a failure. The recovery process may benefit from powering on some of these machines in advance, since redeployment on powered machines is much faster. However, this comes with an additional maintenance cost, so the real problem is how to balance between the expected recovery time improvement and the cost of machines activation.

We model this problem as an offline optimization problem and present a bicriteria approximation algorithm for it. While this is the first performance guaranteed algorithm for this problem, it is somewhat complex to implement in practice. Thus, we further present a much simpler and practical heuristic based on a greedy approach. We evaluate the performance of this heuristic over real data-center data, and show that it performs well in terms of scale, hierarchical faults and variant costs. Our results indicate that our scheme can reduce the overall recovery costs by 10-15% when compared to currently used approaches.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2013
To the main CS technical reports page

Computer science department, Technion