שי וקנין, הרצאה סמינריונית למגיסטר
יום חמישי, 26.10.2017, 13:30
The choice of hyper parameters, such as learning rate, when training Deep Neural Networks is more of art than science.
However, correctly setting them is often crucial to the success of the training process.
Therefore, the common practice is to try many options using various rules of thumb.
As a result, the quest after the best hyper parameters is the most time consuming phase in designing the network, and often the main source of frustration.
We propose two techniques to dynamically adapt the global learning rate.
Our technique is akin to a "judge" and a "defendant".
The "judge" is the current optimization step, and the "defendant" is the following one.
The judge is watching the quality of the learning rate by observing the defendant's behavior.
One method is based on SESOP as a judge, while the other relies on the first order algorithms (SGD, Nestrov, ADAM, etc.)
We evaluate our learning rate heuristic on several models and datasets, and show encouraging results.
For several test cases, our method is able to speedup convergence with minimal loss of accuracy,
and simplify tuning as it helps recover from poor choice of the initial learning rate.