Back to all publications...
The Natural Neural Tangent Kernel: Neural Network Training Dynamics under Natural Gradient Descent
Gradient-based optimization methods have proven successful in learning complex, overparameterized neural networks from non-convex objectives. Yet, the precise theoretical relationship between gradient-based optimization methods, the resulting training dynamics, and generalization in deep neural networks (DNNs) remains unclear. In this work, we investigate the training dynamics of overparameterized DNNs of \emph{finite-width} under natural gradient descent. To do so, we take a function-space view of the training dynamics under natural gradient descent and derive a bound on the discrepancy between the DNN predictive distributions induced by linearized and non-linearized natural gradient descent. Unlike prior work, our bound quantifies the extent to which linearization of the training dynamics of finite-width DNNs affects DNN predictions on arbitrary test points.
Tim G. J. Rudner, Florian Wenzel, Yee Whye Teh, Yarin Gal
Contributed talk, NeurIPS Workshop on Bayesian Deep Learning, 2019
[Preprint]