Communication Aspects in MapReduce

Shantanu Sharma (Ben-Gurion University)
Sunday, 13.12.2015, 11:30
Taub 401

We deal with design of models and algorithms for MapReduce. In MapReduce, an input is replicated to several reducers, and hence, the replication of an input dominates the communication cost, which is a performance measure of a MapReduce algorithm. Our MapReduce model is favorable for various matching problems, where the input is replicated as the output. In particular, we present two classes of matching problems, namely the all-to-all and the X-to-Y matching problems, and show that these matching problems are NP-hard in terms of optimization of the communication cost and the number of reducers. We also provide several near-optimal approximation algorithms for both the problems.

Next, we provide a new algorithmic approach for MapReduce algorithms that decreases the communication cost significantly, while regarding the locality of data and mappers-reducers. The suggested technique avoids the movement of data that does not participate in the final output. We will see the impact of the technique on skewjoin, multi-rounds jobs, and geographically distributed computations. We evaluate impacts on the communication cost for solving the problem of interval join and provide one round algorithm for 2-way interval join.

Foto Afrati (National Technical University of Athens), Shlomi Dolev, Ephraim Korach, Shantanu Sharma (Ben-Gurion University), and Jeffrey Ullman (Stanford University)

Back to the index of events