Assaf Israel, M.Sc. Thesis Seminar
Wednesday, 5.6.2013, 15:30
Maintaining high availability of Ifrastructure-as-a-Service services at a reasonable cost is a challenging task that received recent attention due to the growing popularity of Cloud computing as a preferred means of affordable IT outsourcing. In large data-centers, faults are prone to happen and thus the only reasonable cost-effective method of providing high availability of services is an SLA aware recovery plan; that is, a mapping of the service VMs onto backup machines where they can be executed in case of a failure. The recovery process may benefit from powering on some of these machines in advance, since redeployment on powered machines is much faster. However, this comes with an additional maintenance cost, so the real problem is how to balance between the expected recovery time improvement and the cost of machines activation.
We model this problem as an offline optimization problem and present a bicriteria approximation algorithm for it. While this is the first performance guaranteed algorithm for this problem, it is somewhat complex to implement in practice. Thus, we further present a much simpler and practical heuristic based on a greedy approach. We evaluate the performance of this heuristic over real data-center data, and show that it performs well in terms of scale, hierarchical faults and variant costs. Our results indicate that our scheme can reduce the overall recovery costs by 10-15% when compared to currently used approaches. We also show that fault recovery cost aware VM placement may farther help reducing the expected recovery costs, as it can reduce the backup machine activations costs.