Skip to yearly menu bar Skip to main content


Poster

A New Perspective on Shampoo's Preconditioner

Depen Morwani · Itai Shapira · Nikhil Vyas · Eran Malach · Sham Kakade · Lucas Janson

Hall 3 + Hall 2B #326
[ ]
Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Shampoo, a second-order optimization algorithm that uses a Kronecker product preconditioner, has recently received increasing attention from the machine learning community. Despite the increasing popularity of Shampoo, the theoretical foundations of its effectiveness are not well understood. The preconditioner used by Shampoo can be viewed as either an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. Our key contribution is providing an explicit and novel connection between the optimal Kronecker product approximation of these matrices and the approximationmade by Shampoo. Our connection highlights a subtle but common misconception about Shampoo’s approximation. In particular, the square of the approximation used by the Shampoo optimizer is equivalent to a single step of the poweriteration algorithm for computing the aforementioned optimal Kronecker product approximation. Across a variety of datasets and architectures we empiricallydemonstrate that this is close to the optimal Kronecker product approximation. We also study the impact of batch gradients and empirical Fisher on the quality of Hessian approximation. Our findings not only advance the theoretical understanding of Shampoo but also illuminate potential pathways for enhancing its practical performance.

Live content is unavailable. Log in and register to view live content