Events

The Taub Faculty of Computer Science Events and Talks

Pixel Club: Clustering and Approximating High-Dimensional Streaming Data using Coresets

Dan Feldman (Callifornia Institute of Technology)

Sunday, 22.05.2011, 11:30

Room 337-8 Taub Bld.

Data analysis of massive data sets is common today for web-search (e.g. Google), social networking (e.g. Facebook), financial applications, supermarkets, bioinformatics and many other fields.

A coreset (or, core-set) for a given problem is a "compressed" representation of its input, in the sense that a solution for the problem with the (small) coreset as input would yield an approximate solution to the problem with the original (large) input. Using traditional techniques, a coreset usually implies provable linear time algorithms for the corresponding optimization problem, which can be computed in parallel, via one pass over the data, and using only logarithmic space (i.e, in the streaming model).

In this talk I will present a unified framework that yields the efficient construction of succinct coresets for several problems. Representing the data by a set F of positive functions over a ground set X, our framework forges a link between the combinatorial complexity of the function family F at hand (measured in terms of classical VC dimension) and the paradigm of coresets. Our coresets are obtained via randomized sampling according to a delicately designed sampling distribution. Examples in which our framework has been successful (and significantly improves over previous works) include the k-median, the k-line median, projective clustering, linear regression, low rank approximation, and subspace approximation problems.

[Back to the index of events]