Time+Place: Thursday 18/06/2009 10:30 Room 337-8 Taub Bld.
Title: Cluster-Based Computation of Relational Joins
Speaker: Jeffrey D. Ullman NOTE UNUSUAL DAY AND TIME http://infolab.stanford.edu/~ullman/
Affiliation: Stanford University
Host: Ron Pinter

Abstract:


The prevalence of large racks of interconnected processor nodes
forces us to take another look at how to exploit parallelism
when taking the join of large relations.  Sometimes, there is
a gain in total cost to be had by distributing pieces of each
relation to several different nodes and computing the join of
several large relations at once.  The optimization problem is
to pick the degree of replication of each relation, under the
constraint that the total number of compute-nodes is fixed.
We set up this problem as a nonlinear optimization and show that
there is always a solution (which must be approximated by
rounding to the nearest integers).  For some of the most
common types of join -- star joins and chain joins -- we give
closed-form solutions to the optimization problem.  Finally, we
point out that the join algorithm we propose can be implemented
using features already present in Hadoop, the open-source
implementation of map-reduce.