论文信息 - Memo No . 90 September 8 , 2019 Theory III : Dynamics and Generalization in Deep Networks 1

Memo No . 90 September 8 , 2019 Theory III : Dynamics and Generalization in Deep Networks 1

The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity – such as an explicit regularization term – in the training of deep networks. We will show that a classical form of norm control – but kind of hidden – is responsible for generalization in deep networks trained with gradient descent techniques. In particular, gradient descent induces a dynamics of the normalized weights which converges to a hyperbolic equilibrium. Our approach extends some of the results of Srebro from linear networks to deep networks and provides a new perspective on the implicit bias of gradient descent. The elusive complexity control we describe is responsible, at least in part, for the puzzling empirical finding of good generalization despite overparametrization by deep networks. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 1This replaces previous versions of Theory IIIa and TheoryIIIb. Theory III: Dynamics and Generalization in Deep Networks∗ Andrzej Banburski 1, Qianli Liao1, Brando Miranda1, Tomaso Poggio1, Lorenzo Rosasco1, Fernanda De La Torre1, and Jack Hidary2 1Center for Brains, Minds and Machines, MIT 1CSAIL, MIT 2Alphabet (Google) X Abstract The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity – such as an explicit regularization term – in the training of deep networks. We will show that a classical form of norm control – but kind of hidden – is responsible for generalization in deep networks trained with gradient descent techniques. In particular, gradient descent induces a dynamics of the normalized weights which converges to a hyperbolic equilibrium. Our approach extends some of the results of Srebro from linear networks to deep networks and provides a new perspective on the implicit bias of gradient descent. The elusive complexity control we describe is responsible, at least in part, for the puzzling empirical finding of good generalization despite overparametrization by deep networks.The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity – such as an explicit regularization term – in the training of deep networks. We will show that a classical form of norm control – but kind of hidden – is responsible for generalization in deep networks trained with gradient descent techniques. In particular, gradient descent induces a dynamics of the normalized weights which converges to a hyperbolic equilibrium. Our approach extends some of the results of Srebro from linear networks to deep networks and provides a new perspective on the implicit bias of gradient descent. The elusive complexity control we describe is responsible, at least in part, for the puzzling empirical finding of good generalization despite overparametrization by deep networks.