Aviv Yehezkel, Ph.D. Thesis Seminar
Wednesday, 28.6.2017, 13:00
Sketch-based streaming algorithms allow efficient processing of big data.
These algorithms use small fixed-size storage to store a summary ("sketch") of the input data, and use probabilistic algorithms to accurately estimate the desired quantity.
A fundamental streaming problem is "cardinality estimation": given a very long stream of elements, the goal is to estimate the number of distinct elements.
This is a well-known problem with numerous applications for network monitoring, security, query optimization, query execution progress, etc.
This Ph.D. research focuses on several generalizations of the cardinality estimation problem.
We propose new streaming algorithms, analyze their efficiency and statistical performance, and use them for solving interesting network monitoring problems.
In this talk, two main results will be presented.
The first is a novel estimator to the cardinality of set intersection between two sets, which is shown to outperform all previously known estimators.
The second is a new framework for the cardinality estimation problem.
This framework consists of a novel sketch, and a family of algorithms that use this sketch to accurately estimate the cardinality of any set expression.