Incorporating prior information in machine learning by creating virtual examples

One of the key problems in supervised learning is the insufficient size of the training set. The natural way for an intelligent learner to counter this problem and successfully generalize is to exploit prior information that may be available about the domain or that can be learned from prototypical examples. We discuss the notion of using prior knowledge by creating virtual examples and thereby expanding the effective training-set size. We show that in some contexts this idea is mathematically equivalent to incorporating the prior knowledge as a regularizer, suggesting that the strategy is well motivated. The process of creating virtual examples in real-world pattern recognition tasks is highly nontrivial. We provide demonstrative examples from object recognition and speech recognition to illustrate the idea.

[1]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[2]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[3]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[4]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[5]  M. Bertero Regularization methods for linear inverse problems , 1986 .

[6]  A. Verri,et al.  Regularization Theory and Shape Constraints , 1986 .

[7]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[8]  T. Poggio,et al.  Synthesizing a color algorithm from examples. , 1988, Science.

[9]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[10]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[11]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[12]  C. DeWitt-Morette,et al.  Mathematical Analysis and Numerical Methods for Science and Technology , 1990 .

[13]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[14]  Victor Zue,et al.  Correlation analysis of vowels and their application to speech recognition , 1991, EUROSPEECH.

[15]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[16]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[17]  D. Pomerleau Eecient T Raining of Artiicial Neural Networks for Autonomous Navigation , 1991 .

[18]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[19]  T. Poggio,et al.  Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[20]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[21]  Yaser S. Abu-Mostafa,et al.  A Method for Learning From Hints , 1992, NIPS.

[22]  Partha Niyogi Modelling Speaker Variability and Imposing Speaker Constraints in Phonetic Classification , 1992 .

[23]  Tomaso Poggio,et al.  Example Based Image Analysis and Synthesis , 1993 .

[24]  P. Schyns,et al.  Conditions for viewpoint dependent face recognition , 1993 .

[25]  Yaser S. Abu-Mostafa,et al.  Hints and the VC Dimension , 1993, Neural Computation.

[26]  David H. Wolpert,et al.  The Mathematics of Generalization: The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning , 1994 .

[27]  David Beymer,et al.  Face recognition under varying pose , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[28]  T. Poggio,et al.  The importance of symmetry and virtual views in three-dimensional object recognition , 1994, Current Biology.

[29]  Emanuele Trucco,et al.  Geometric Invariance in Computer Vision , 1995 .

[30]  Todd K. Leen,et al.  From Data Distributions to Regularization in Invariant Learning , 1995, Neural Computation.

[31]  David Beymer,et al.  Face recognition from one example view , 1995, Proceedings of IEEE International Conference on Computer Vision.

[32]  Tomaso A. Poggio,et al.  Model-based matching of line drawings by linear combinations of prototypes , 1995, Proceedings of IEEE International Conference on Computer Vision.

[33]  Christopher M. Bishop,et al.  Training with Noise is Equivalent to Tikhonov Regularization , 1995, Neural Computation.

[34]  Yaser S. Abu-Mostafa,et al.  Hints , 2018, Neural Computation.

[35]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[36]  H. Bülthoff,et al.  Face recognition under varying poses: The role of texture and shape , 1996, Vision Research.

[37]  Federico Girosi,et al.  On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[38]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[39]  Tomaso A. Poggio,et al.  A bootstrapping algorithm for learning linear models of object classes , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[41]  Image Warping , 2000 .

[42]  F. Girosi,et al.  On the Relationship between Generalization Error , Hypothesis NG 1879 Complexity , and Sample Complexity for Radial Basis Functions N 00014-92-J-1879 6 , 2022 .