Netanel Raviv (CS, Technion)
Stochastic Gradient Descent (SGD) is a popular method for learning classes of linear predictors. If the size of the training set is large, a computational bottleneck in SGD is the computation of the gradient, and hence, it is common to distribute the gradient computation among worker nodes. However, in distributed computation scenarios, stragglers (i.e., slow or unavailable nodes) might cause a considerable delay, and hence, schemes for mitigation of stragglers are essential.
It was recently suggested by Tandon et al. to mitigate stragglers by coding across gradients, and a randomized construction was given. In this paper we obtain a comparable deterministic scheme by employing cyclic MDS codes. In addition, we suggest to replace the exact computation of the gradient by an approximate one; a technique which drastically increases the straggler tolerance, and stems from adjacency matrices of expander graphs.