יום רביעי, 29.4.2015, 11:30
Log-structured data stores (LSM-DSs) are widely accepted as the state-of-the-art implementation of key-value stores. They replace random disk writes with sequential I/O, by accumulating large batches of updates in an in-memory data structure and merging it with the on-disk store in the background. While LSM-DS implementations proved to be highly successful at masking the I/O bottleneck, scaling them up on multicore CPUs remains a challenge. This is nontrivial due to their often rich APIs, as well as the need to coordinate the RAM access with the background I/O.
We present cLSM, an algorithm for scalable concurrency in LSM-DS, which exploits multiprocessor-friendly data structures and non-blocking synchronization. cLSM supports a rich API, including consistent snapshot scans and general non-blocking read-modify-write operations. We implement cLSM based on the popular LevelDB key value store, and evaluate it using intensive synthetic workloads as well as ones from production web-serving applications. Our algorithm outperforms state of the art LSMDS implementations, improving throughput by 1.5x to 2.5x. Moreover, cLSM demonstrates superior scalability with the number of cores (successfully exploiting twice as many cores as the competition).
This is a joint work with Guy Golan, Eddie Bortnikov (Yahoo Labs) and Idit Keidar (Technion, Yahoo Labs), presented at EuroSys 2015.
Eshcar Hillel is a Research Scientist at Yahoo Labs, currently working in the Scalable Search Systems Research team. Eshcar received the B.Sc. degree in Software Engineering and the Ph.D. degree in Computer Science from the Technion, in 2002 and 2011, respectively. Prior to her graduate studies she worked at Hewlett-Packard as part of the Non-Stop relational database system team.