To recognize shapes, first learn to generate images.

[1]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[2]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[5]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[7]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[8]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[9]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[10]  Alessandra Angelucci,et al.  Induction of visual orientation modules in auditory cortex , 2000, Nature.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[14]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[15]  A. Karni,et al.  Dependence on REM sleep of overnight improvement of a perceptual skill. , 1994, Science.

[16]  Michael I. Jordan,et al.  A more biologically plausible learning rule for neural networks. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Marwan A. Jabri,et al.  Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks , 1991, Neural Comput..

[18]  O. G. Selfridge,et al.  Pandemonium: a paradigm for learning , 1988 .

[19]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[20]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[21]  D. J. Felleman,et al.  Topographic reorganization of somatosensory cortical areas 3b and 1 in adult monkeys following restricted deafferentation , 1983, Neuroscience.

[22]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[23]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[24]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[25]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .