Convolutional networks and applications in vision

Intelligent tasks, such as visual perception, auditory perception, and language understanding require the construction of good internal representations of the world (or "features")? which must be invariant to irrelevant variations of the input while, preserving relevant information. A major question for Machine Learning is how to learn such good features automatically. Convolutional Networks (ConvNets) are a biologically-inspired trainable architecture that can learn invariant features. Each stage in a ConvNets is composed of a filter bank, some nonlinearities, and feature pooling layers. With multiple stages, a ConvNet can learn multi-level hierarchies of features. While ConvNets have been successfully deployed in many commercial applications from OCR to video surveillance, they require large amounts of labeled training samples. We describe new unsupervised learning algorithms, and new non-linear stages that allow ConvNets to be trained with very few labeled samples. Applications to visual object recognition and vision navigation for off-road mobile robots are described.

[1]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[2]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[3]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[4]  Michael C. Mozer,et al.  Perception of multiple objects - a connectionist approach , 1991, Neural network modeling and connectionism.

[5]  Lawrence D. Jackel,et al.  An analog neural network processor with programmable topology , 1991 .

[6]  Michael C. Mozer,et al.  The perception of multiple objects , 1991 .

[7]  Lawrence D. Jackel,et al.  Application of the ANNA neural network chip to high-speed character recognition , 1992, IEEE Trans. Neural Networks.

[8]  R. Vaillant,et al.  An original approach for the localization of objects in images , 1993 .

[9]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[10]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[11]  Steven Pigeon,et al.  VIP: an FPGA-based processor for image processing and neural networks , 1996, Proceedings of Fifth International Conference on Microelectronics for Neural Networks.

[12]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Stacy Holman Jones Torch , 1999 .

[15]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[19]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Yann LeCun,et al.  Toward automatic phenotyping of developing embryos from videos , 2005, IEEE Transactions on Image Processing.

[21]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Ahmad Abdulkader,et al.  Two-Tier Approach for Arabic Offline Handwriting Recognition , 2006 .

[24]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[25]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[26]  Kumar Chellapilla,et al.  A New Radical Based Approach to Offline Handwritten East-Asian Character Recognition , 2006 .

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[29]  Patrice Y. Simard,et al.  Optimally combining a cascade of classifiers , 2006, Electronic Imaging.

[30]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Takashi Morie,et al.  Projection-Field-Type VLSI Convolutional Neural Networks Using Merged/Mixed Analog-Digital Approach , 2007, ICONIP.

[32]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[33]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[34]  Michael Happold,et al.  Using Learned Features from 3D Data for Robot Navigation , 2007 .

[35]  Joseph F. Murray,et al.  Supervised Learning of Image Restoration with Convolutional Networks , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[37]  Eero P. Simoncelli,et al.  Nonlinear image representation using divisive normalization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[39]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[40]  H. Sebastian Seung,et al.  Natural Image Denoising with Convolutional Networks , 2008, NIPS.

[41]  Christophe Garcia,et al.  text Detection with Convolutional Neural Networks , 2008, VISAPP.

[42]  Yann LeCun,et al.  CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[43]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[44]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  Yann LeCun,et al.  Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.

[46]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Gernot A. Fink,et al.  Face Detection Using GPU-Based Convolutional Neural Networks , 2009, CAIP.

[48]  Yann LeCun,et al.  An FPGA-based stream processor for embedded real-time vision with Convolutional Networks , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[49]  Yann LeCun,et al.  EBLearn: Open-Source Energy-Based Learning in C++ , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[50]  Tobi Delbrück,et al.  CAVIAR: A 45k Neuron, 5M Synapse, 12G Connects/s AER Hardware Sensory–Processing– Learning–Actuating System for High-Speed Visual Object Recognition and Tracking , 2009, IEEE Transactions on Neural Networks.

[51]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[52]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.