DAdam: A Consensus-Based Distributed Adaptive Gradient Method for Online Optimization

Abstract

Adaptive optimization methods, such as AdaGrad , RMSProp , and Adam , are widely used in solving large-scale machine learning problems. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications between peripheral nodes with a central node, but incur high communications cost. To address this issue, we develop a novel consensus-based distributed adaptive moment estimation method ( DAdam ) for online optimization over a decentralized network that enables data parallelization, as well as decentralized computation. The method is particularly useful, since it can accommodate settings where access only to local data is permitted. Further, as established theoretically in this work, it can outperform centralized adaptive algorithms, for certain classes of loss functions used in machine learning applications. We analyze the convergence properties of the proposed algorithm and provide a regret bound on the convergence rate of adaptive moment estimation methods in both online convex and non-convex settings. Empirical results demonstrate that DAdam exhibits also good performance in practice and compares favorably to competing online optimization methods.

Publication
IEEE Transactions on Signal Processing