Integral representation of the global minimizer

We have obtained an integral representation of the shallow neural network that attains the global minimum of its backpropagation (BP) training problem. According to our unpublished numerical simulations conducted several years prior to this study, we had noticed that such an integral representation may exist, but it was not proven until today. First, we introduced a Hilbert space of coefficient functions, and a reproducing kernel Hilbert space (RKHS) of hypotheses, associated with the integral representation. The RKHS reflects the approximation ability of neural networks. Second, we established the ridgelet analysis on RKHS. The analytic property of the integral representation is remarkably clear. Third, we reformulated the BP training as the optimization problem in the space of coefficient functions, and obtained a formal expression of the unique global minimizer, according to the Tikhonov regularization theory. Finally, we demonstrated that the global minimizer is the shrink ridgelet transform. Since the relation between an integral representation and an ordinary finite network is not clear, and BP is convex in the integral representation, we cannot immediately answer the question such as “Is a local minimum a global minimum?” However, the obtained integral representation provides an explicit expression of the global minimizer, without linearity-like assumptions, such as partial linearity and monotonicity. Furthermore, it indicates that the ordinary ridgelet transform provides the minimum norm solution to the original training equation.

[1]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[2]  Noboru Murata,et al.  An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds , 1996, Neural Networks.

[3]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[4]  Yann LeCun,et al.  Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.

[5]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  N. Murata,et al.  Double Continuum Limit of Deep Neural Networks , 2017 .

[8]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[9]  Suvrit Sra,et al.  Global optimality conditions for deep neural networks , 2017, ICLR.

[10]  Fred A. Hamprecht,et al.  Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.

[11]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[12]  Daniel Soudry,et al.  No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.

[13]  Vera Kurková,et al.  Complexity estimates based on integral transforms induced by computational units , 2012, Neural Networks.

[14]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[15]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[16]  Matthias Hein,et al.  The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.

[17]  Haihao Lu,et al.  Depth Creates No Bad Local Minima , 2017, ArXiv.

[18]  Noboru Murata,et al.  Sampling Hidden Parameters from Oracle Distribution , 2014, ICANN.

[19]  Taiji Suzuki,et al.  Fast generalization error bound of deep learning from a kernel perspective , 2018, AISTATS.

[20]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[21]  F. Girosi,et al.  Convergence Rates of Approximation by Translates , 1992 .

[22]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[23]  Noboru Murata,et al.  Neural Network with Unbounded Activation Functions is Universal Approximator , 2015, 1505.03654.