Guy Sagy, Ph.D. Thesis Seminar
Wednesday, 8.2.2012, 13:00
A basic requirement in many distributed systems is the ability to
detect objects whose score, according to a given function, exceeds
some threshold. Since an object's data can be partitioned over various
nodes, computing its global score requires collecting its data over
the network. A main challenge is to perform threshold queries or
monitoring with minimum network communication, i.e., without
collecting the data from the nodes to a central location.
In this talk I will present an application of geometric ideas for
performing threshold queries and top-k queries over distributed
databases with minimum communication. The proposed method can handle
general functions by representing them as a difference of monotonic
functions, and imposing thresholds on the latter.
In addition, I will present a novel scheme for communication reduction
in distributed monitoring using local constraints. Communication is
required only in the event that the local constraints are violated by
the incoming data. As opposed to previous work on geometric
monitoring, here the local constraints are tailored to fit the local
data distribution at each stream, and satisfy an optimality criterion.
The result is a substantial decrease in the required volume of
communication compared to previous state of the art, up to two orders
of magnitude in experiments with real-life data. Theoretical analysis
suggests that the reduction can sometimes be unbounded, and that it
typically improves with the dimensionality of the data, which is borne
out in the experiments.