Ariel Kolikant, M.Sc. Thesis Seminar
Deduplication reduces the size of the data stored in large-scale storage systems by replacing duplicate data blocks with references to their unique copies. This creates dependencies between files that contain similar content and complicates the management of data in the system. In the work presented in this seminar, we have addressed the problem of data migration, where files are remapped between different volumes because of system expansion or maintenance. The challenge of determining which files and blocks to migrate has been studied extensively for systems without deduplication. In the context of deduplicated storage, however, only simplified migration scenarios were considered. In our work we have formulated the general migration problem for deduplicated systems as an optimization problem whose objective is to minimize the system’s size while ensuring that the storage load is evenly distributed between the system’s volumes, and that the network traffic required for the migration does not exceed its allocation. We then modeled an ILP algorithm to solve the migration problem generated, and compared it’s results to two other algorithms solving the same generated migration problem: the greedy algorithm and the clustering algorithm. Our ILP algorithm manages to consistently obtain the best solutions to the problem though it requires significantly larger execution times.