Memory-Efficient Adaptive Optimization for Large-Scale Learning

Anil, Rohan; Gupta, Vineet; Koren, Tomer; Singer, Yoram

Anil, Rohan ; Gupta, Vineet ; Koren, Tomer ; Singer, Yoram

arXiv, Tome 2019 (2019) no. 0, / Harvested from

Résumé

Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training very large deep models.

Publié le : 2019-01-30
Classification: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning

@article{1901.11150,
     author = {Anil, Rohan and Gupta, Vineet and Koren, Tomer and Singer, Yoram},
     title = {Memory-Efficient Adaptive Optimization for Large-Scale Learning},
     journal = {arXiv},
     volume = {2019},
     number = {0},
     year = {2019},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1901.11150}
}

Anil, Rohan; Gupta, Vineet; Koren, Tomer; Singer, Yoram. Memory-Efficient Adaptive Optimization for Large-Scale Learning. arXiv, Tome 2019 (2019) no. 0, . http://gdmltest.u-ga.fr/item/1901.11150/