Skip to content (access key 's')
Logo of Technion
Logo of CS Department


Predicting a Better Future for Asynchronous SGD with DANA
event speaker icon
Ido Hakimi, M.Sc. Thesis Seminar
event date icon
Monday, 24.12.2018, 11:00
event location icon
Taub 601
Distributed training can significantly reduce the training time of neural networks. Despite its potential, however, distributed training has not been widely adopted due to the difficulty of scaling the training process. Existing methods suffer from slow convergence and low final accuracy when scaling to large clusters, and often require substantial re-tuning of hyper-parameters. We propose DANA, a novel approach that scales to large clusters while maintaining similar final accuracy and convergence speed to that of a single worker. DANA estimates the future value of model parameters by adapting Nesterov Accelerated Gradient to a distributed setting, and so mitigates the effect of gradient staleness, one of the main difficulties in scaling SGD to more workers. Evaluation on three state-of-the-art network architectures and three datasets shows that DANA scales as well as or better than existing work without having to tune any hyperparameters or tweak the learning schedule. For example, DANA achieves 75.73\% accuracy on ImageNet when training ResNet-50 with 16 workers, similar to the non-distributed baseline.
[Back to the index of events]