Invariance is widely described as a desirable property of neural networks, but the mechanisms by which it benefits deep learning remain shrouded in mystery. We show that building invariance into model architecture via feature averaging provably tightens PAC-Bayes generalization bounds, as compared to data augmentation. Furthermore, through a link to the marginal likelihood and Bayesian model selection, we provide justification for using the improvement in these bounds for model selection. Our key observation is that invariance doesn’t just reduce variance in deep learning: it also changes the parameter-function mapping, and this leads better provable guarantees for the model. We verify our theoretical results empirically on a permutation-invariant dataset.
Clare Lyle, Marta Kwiatkowska, Yarin Gal
14th Women in Machine Learning Workshop (WiML 2019)