关键词:随机梯度下降法;蝴蝶混合;机器学习;数据挖掘;顺序算法;
摘 要:Stochastic gradient descent is a widely used method to find locally-optimal models in machinelearning and data mining. However, it is naturally a sequential algorithm, and parallelizationinvolves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient steps are inter leaved with the k stages of a butterfly network on 2k nodes.Udp based butterfly mix steps should be extremely fast and failure-tolerant, and convergence is almost as fast as a full mix (All Reduce) on every step.