Technical Report PHD-2017-11

Title: Communication-efficient Algorithms for Distributed Stream Mining
Authors: Moshe Gabel
Supervisors: Assaf Schuster and Daniel Keren
PDFCurrently accessibly only within the Technion network
Abstract: Recent years have seen an explosion in the number of connected devices, which means not only growth in velocity and volume of data, but also that data sources are increasingly spatially distributed, incurring higher communication costs. Data mining algorithms often assume that data is centralized or that communication is inexpensive: the setting is implicitly assumed to be a data center. In settings like wireless sensor networks, however, communication costs limited battery power. Moreover, most work only considers one-shot computation: computing a result once from a fixed data set. Yet data is increasingly dynamic, and many applications require current results over a recent time window.

We focus on computing approximations over aggregated distributed data streams with reduced communication. We describe three novel distributed approximations for important non-linear functions: variance, Shannon's entropy, and least-squares regression. Using a geometric safe zone approach (also called geometric monitoring), we convert global approximation bounds on the data aggregate to local threshold constraints that each node can check independently. Our algorithms provide deterministic user-defined approximation bounds, while avoiding messages unless they are needed to maintain those bounds. Compared to the centralized solution, our algorithms reduce communication by up to two orders of magnitude on several real data sets that represent real applications, including machine health monitoring, network monitoring with netflows, traffic monitoring, and others. We also extend the formulation of geometric monitoring to variable-sized windows, which is critical to achieve low communication when window sizes are dynamic and vary across nodes.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2017
To the main CS technical reports page

Computer science department, Technion