Asaf Cidon (Stanford University)
Wednesday, 4.7.2012, 11:30
Randomized node selection is widely used in large-scale, distributed storage systems to both load balance chunks of data across the cluster and select replica nodes to provide data durability. We argue that while randomized node selection is great for load balancing, it fails to protect data under a common failure scenario. We present MinCopysets, a simple, general-purpose and scalable replication technique to improve data durability while retaining the benefits of randomized load balancing. Our contribution is to decouple the mechanisms used for load balancing from data replication: we use randomized node selection for load balancing but derandomize node selection for data replication. We show that MinCopysets provides significant improvements in data durability. For example in a 1000 node cluster under a power outage that kills 1% of the nodes, it reduces the probability of data loss from 99.7% to 0.02% compared to random replica selection. We implemented MinCopysets on two open source distributed storage systems, RAMCloud and HDFS, and demonstrate that MinCopysets causes negligible overhead on normal storage operations.
In collaboration with Ryan Stutsman, Stephen Rumble, Sachin Katti, John Ousterhout and Mendel Rosenblum.