יום חמישי, 3.1.2019, 10:30
Understanding deep learning calls for addressing three fundamental
questions: expressiveness, optimization and generalization.
Expressiveness refers to the ability of compactly sized deep neural
networks to represent functions capable of solving real-world problems.
Optimization concerns the effectiveness of simple gradient-based
algorithms in solving non-convex neural network training programs.
Generalization treats the phenomenon of deep learning models not
overfitting despite having much more parameters than examples to learn
from. This talk will describe a series of works aimed at unraveling
some of the mysteries behind optimization and expressiveness. I will
begin by discussing recent analyses of optimization for deep linear
neural networks. By studying the trajectories of gradient descent, we
will derive the most general guarantee to date for efficient convergence
to global minimum of a gradient-based algorithm training a deep network.
Moreover, in stark contrast to conventional wisdom, we will see that,
sometimes, gradient descent can train a deep linear network faster than
a classic linear model. In other words, depth can accelerate
optimization, even without any gain in expressiveness, and despite
introducing non-convexity to a formerly convex problem. In the second
(shorter) part of the talk, I will present an equivalence between
convolutional and recurrent networks --- the most successful deep
learning architectures to date --- and hierarchical tensor
decompositions. The equivalence brings forth answers to various
questions concerning expressiveness, resulting in new
theoretically-backed tools for deep network design.
Optimization works covered in this talk were in collaboration with
Sanjeev Arora, Elad Hazan, Noah Golowich and Wei Hu. Expressiveness
works were with Amnon Shashua, Or Sharir, Yoav Levine, Ronen Tamari and
Nadav Cohen is a postdoctoral member at the School of Mathematics in the
Institute for Advanced Study. His research focuses on the theoretical
and algorithmic foundations of deep learning. In particular, he is
interested in mathematically analyzing aspects of expressiveness,
optimization and generalization, with the goal of deriving theoretically
founded procedures and algorithms that will improve practical
performance. Nadav earned his PhD at the School of Computer Science and
Engineering in the Hebrew University of Jerusalem, under the supervision
of Prof. Amnon Shashua. Prior to that, he obtained a BSc in electrical
engineering and a BSc in mathematics (both summa cum laude) at the
Technion Excellence Program for distinguished undergraduates. For his
contributions to the theoretical understanding of deep learning, Nadav
received a number of awards, including the Google Doctoral Fellowship in
Machine Learning, the Rothschild Postdoctoral Fellowship, and the
Zuckerman Postdoctoral Fellowship.