Time+Place: Tuesday 11/05/2010 14:30 Room 337-8 Taub Bld.
Title: Oblivious and cost aware distributed load sharing in the Cloud
Speaker: Danny Raz http://www.cs.technion.ac.il/people/danny/
Affiliation: Computer Science, Technion
Host:

Abstract:


In many cases, large scale cloud-based services are provided simultaneously
from several, potentially distant sites. The actual choice of the specific
site and the specific server that would fulfill a given user request has a
critical impact on the overall performance of the service. This gives rise
to a highly complex optimization problem, which often involves multiple
objectives and many parameters. 
Irrespective of the precise optimization criteria, any attempt to address
such an optimization problem will incur significant overhead by collecting
the required (state-dependent) information from the various network
locations.
One way to address this problem is through an oblivious approach, i.e., a
distributed load-sharing scheme that does not use any state information. We
revisit this extensively studied problem and present a novel scheme, based
on creating, in addition to the regular job requests that are assigned to a
randomly chosen server, also low priority job request replicas that are sent
to a different randomly chosen server. We show that, when servers can
coordinate the removal of redundant copies upon completion of a job, the
performance of the system exhibits significant improvement even under high
load conditions.
Another way to address this challenge is to use only a limited amount of
server state information, this is the done, for example, in the well known
Supermarket model studied by Mitzenmacher. However, this model has not
incorporated the cost associated with obtaining the state information. Our
focus is on a rigorous study of the right amount of monitoring, that is, we
want to maximize system utility by monitoring the needed servers without
over monitoring. Following the theoretical model we develop several
practical approaches in this context and study their expected performance.
We also show how the low priority job replica scheme can be used to further
improve this model.

This talk is based on joint papers with David Breitgand, Rami Cohen, Amir
Nahir, and Ariel Orda.

Short Bio:

Danny Raz received his doctoral degree from the Weizmann Institute of
Science, Israel, in 1995. From 1995 to 1997 he was a post-doctoral fellow at
the International Computer Science Institute, (ICSI) Berkeley, CA, and a
visiting lecturer at the University of California, Berkeley. Between 1997
and 2001 he was a Member of Technical Staff at the Networking Research
Laboratory at Bell Labs, Lucent Technologies. In  October 2000, Danny Raz
joined the faculty of the computer science department at the Technion,
Israel.

Danny Raz served as the general chair of OpenArch 2000, a TPC co-chair of IM
2009, and as a TPC member for many conferences including INFOCOM 2002-2003,
2010, OpenArch 2000-2001-2003, IM-NOMS 2001-2010, and as an Editor of the
IEEE/ACM Transactions on Networking (ToN). His primary research interest is
the theory and application of management related problems in IP networks,
with a special emphasis on efficient resource utilization.