Tomer Koren, M.Sc. Thesis Seminar
Wednesday, 7.3.2012, 14:30
In recent years, stochastic approximation approaches such as Stochastic
Gradient Descent (SGD) and Stochastic Dual Averaging have become the
optimization method of choice for many learning problems, including linear
Support Vector Machines (SVMs). This is not surprising, since such methods
yield optimal generalization guarantees with only a single pass over the data.
They in a sense have an unbeatable runtime: the runtime to get to a desired
generalization goal is the same as the size of the data set required to do so.
In this work we show, for the first time, how to beat this unbeatable runtime,
and present a method that in a certain relevant regime, learns in runtime less
than the size of the minimal training set size required for learning. Our
method is based on a stochastic primal-dual approach, where the primal step is
akin to an importance-weighted SGD, and the dual step is a stochastic update
on the importance weights.