Learning in High Dimension Always Amounts to Extrapolation

The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample x whenever this sample falls inside or on the boundary of the given dataset’s convex hull. Extrapolation occurs when x falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well because of their ability to correctly interpolate training data. A second (mis)conception is that interpolation happens throughout tasks and datasets, in fact, many intuitions and theories rely on that assumption. We empirically and theoretically argue against those two points and demonstrate that on any high-dimensional (>100) dataset, interpolation almost surely never happens. Those results challenge the validity of our current interpolation/extrapolation definition as an indicator of generalization performances.

[1]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[2]  Jing Wang,et al.  MLLE: Modified Locally Linear Embedding Using Multiple Weights , 2006, NIPS.

[3]  Julien Mairal,et al.  On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.

[4]  Pavel Valtr The probability thatn random points in a triangle are in convex position , 1996, Comb..

[5]  Y. Anzai,et al.  Pattern Recognition & Machine Learning , 2016 .

[6]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[7]  S. Majumdar,et al.  Random Convex Hulls and Extreme Value Statistics , 2009, 0912.0631.

[8]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[9]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[10]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[11]  Zoltán Füredi,et al.  On the shape of the convex hull of random points , 1988 .

[12]  Philip M. Long,et al.  When does gradient descent with logistic loss find interpolating two-layer networks? , 2020, ArXiv.

[13]  P. Valtr,et al.  Probability thatn random points are in convex position , 1995, Discret. Comput. Geom..

[14]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[15]  Mikhail Belkin,et al.  To understand deep learning we need to understand kernel learning , 2018, ICML.

[16]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[17]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[18]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series, with Engineering Applications , 1949 .

[19]  Dmitry Zaporozhets,et al.  Absorption probabilities for Gaussian polytopes and regular spherical simplices , 2017, Advances in Applied Probability.

[20]  Jeffrey Pennington,et al.  The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization , 2020, ICML.

[21]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[22]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[23]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[25]  Christian Buchta On a conjecture of R. E. Miles about the convex hull of random points , 1986 .

[26]  R. Bennett REPRESENTATION AND ANALYSIS OF SIGNALS PART XXI. THE INTRINSIC DIMENSIONALITY OF SIGNAL COLLECTIONS , 1965 .