Guillermo Sapiro (Duke University)
Tuesday, 25.12.2012, 11:30
We consider the problem of finding a few representatives for a dataset,
i.e., a subset of data points that efficiently describes the entire dataset.
We assume that each data point can be expressed as a linear combination of
the representatives and formulate the problem of finding the representatives
as a sparse multiple measurement vector problem. In our formulation, both
the dictionary and the measurements are given by the data matrix, and the
unknown sparse codes select the representatives via convex optimization. In
general, we do not assume that the data are lowrank or distributed around
cluster centers. When the data do come from a collection of low-rank
models, we show that our method automatically selects a few representatives
from each low-rank model. We also analyze the geometry of the
representatives and discuss their relationship to the vertices of the
convex hull of the data. We show that our framework can be extended to
detect and reject outliers in datasets, and to efficiently deal with new
observations and large datasets. The proposed framework and theoretical
foundations are illustrated with examples in video summarization and image
classification using representatives. Finally, we discuss how to extend this
when the data is given as pairwise distances. This is joint work with E.
Elhamifar and R. Vidal
Similar mathematical foundations based on sparse modeling
lead to the computation of networks and estimation of inverse covariances.
I will conclude the talk with some recent results on adding topological
constraints to these computations.
This is joint work with M. Fiori and P. Muse.