Arnon Lazerson, Ph.D. Thesis Seminar
Sunday, 15.10.2017, 12:00
Emerging large-scale applications rely on continuous tracking of complex queries over collections of massive, dynamic, and physically-distributed data streams. Thus, in addition to the space- and time-efficiency requirements of conventional stream processing (at each distributed site), effective solutions also need to guarantee communication efficiency. Continuously collecting the data to a central location is infeasible in large scale applications, as the excess communication required interferes with the normal operation of the data network. Furthermore, in the case of battery operated devices such as wireless sensor nodes, central data accumulation depletes the power supply of individual devices, reducing the network lifetime.
In this talk, we focus on reducing the communication cost of distributed threshold monitoring queries used in numerous applications either directly or as the main building block for other queries (such as top-k and``heavy-hitters''). Such queries can also be naturally extended to the more general problem of approximate function monitoring, where the goal is to track the value of a function to within user-prescribed error bounds. We'll show that by carefully constructing local conditions that are indicative of a global change in the aggregated value, we can reduce the communication cost of distributed threshold monitoring and function approximation tasks by orders of magnitude in many important cases.