Omri Ben Eliezer (Tel-Aviv University)
Wednesday, 6.11.2019, 12:30
Suppose we receive a stream of data, sampling each arriving data element with some predetermined probability. It is well known that if the elements in the stream are chosen in advance, then a constant number of random samples suffice to make the sampled subset “represent” the whole stream with good probability. However, what if the elements are chosen adaptively, by an adversary that knows exactly, at any point along the stream, which elements were sampled previously? Does the same sample size suffice to guarantee “representability”?
The simplistic answer is negative, and we demonstrate this by describing an efficient adaptive attack. However, this attack is "theoretical only", requiring the universe of elements to be exponential in the stream length. This is not a coincidence: We show how to make the sampling algorithm robust against adaptive adversaries when the universe is smaller. As an application, this yields a sublinear-query adversarially robust randomized streaming algorithm for a wide range of data analysis problems, even in high dimensions.
Joint work with Eylon Yogev, to appear in PODS 2020. Full version: https://arxiv.org/abs/1906.11327