Time+Place: Thursday 02/02/2006 14:30 Room 337-8 Taub Bld.
Title: Sublinear Algorithms for Approximating String Compressibility and the Distribution Support Size
Speaker: Sofya Raskhodnikova http://theory.lcs.mit.edu/~sofya
Affiliation: Weizmann Institute
Host: Eldar Fischer

Abstract:


Imagine having to choose between a few compression schemes to
compress a very long file. Before deciding on the scheme, you
might want to obtain a rough estimate on how well each scheme
performs on your file. We consider the question of approximating
compressibility of a string with respect to a fixed compression
scheme, in sublinear time.

In the talk, we will concentrate on the run-length encoding and
a version of Lempel-Ziv as our example compression schemes. We
present algorithms and lower bounds for approximating
compressibility with respect to both schemes. We show that
compressibility with respect to Lempel-Ziv is related to
approximating the support size of a distribution. This problem
has been considered under different guises in the literature. We
prove a lower bound for it, at the heart of which is a
construction of two positive integer random variables, X and Y,
with very different expectations and the following condition on
the moments up to k:

E[X]/E[Y] = E[X^2]/E[Y^2] = ... = E[X^k]/E[Y^k].

Joint work with Dana Ron, Ronitt Rubinfeld, Amir Shpilka and
Adam Smith