Katrina Ligett (Hebrew University of Jerusalem)
Wednesday, 25.5.2016, 12:30
Generalization, informally, is the ability of a learner to reflect not just its training data, but properties of the underlying distribution from which the data are drawn. When paired with empirical risk minimization, it is a fundamental goal of learning. Typically, we say that a learning algorithm generalizes if, given access to some training set drawn i.i.d. from an underlying data distribution, it returns a hypothesis whose empirical error (on the training data) is close to its true error (on the underlying distribution).
This is, however, a surprisingly brittle notion—even if the output of a learning algorithm generalizes, one may be able to extract additional hypotheses by performing further computations on the output hypothesis that do not themselves generalize. As an example, notice that the standard notion of generalization does not prevent a learner from encoding the entire training set in the hypothesis that it outputs, which in turn allows a data analyst to generate a hypothesis that over-fits to an arbitrary degree.
In this talk, we will explore what properties one might want from a notion of generalization, why, and why one should care. We will discuss a variety of candidate generalization notions, their properties, and their relationships. We will draw on recent connections between generalization and a formal privacy notion known as ``differential privacy.’’ There will be plenty of open questions.
No background in learning theory or in differential privacy will be assumed.
This talk will draw on a paper at the upcoming COLT, joint with Rachel Cummings, Kobbi Nissim, Aaron Roth, and Steven Wu.