PROC OF THE IEEE NOVEMBER Gradient Based Learning Applied to Document Recognition

Multilayer Neural Networks trained with the backpropa gation algorithm constitute the best example of a successful Gradient Based Learning technique Given an appropriate network architecture Gradient Based Learning algorithms can be used to synthesize a complex decision surface that can classify high dimensional patterns such as handwritten char acters with minimal preprocessing This paper reviews var ious methods applied to handwritten character recognition and compares them on a standard handwritten digit recog nition task Convolutional Neural Networks that are specif ically designed to deal with the variability of D shapes are shown to outperform all other techniques Real life document recognition systems are composed of multiple modules including eld extraction segmenta tion recognition and language modeling A new learning paradigm called Graph Transformer Networks GTN al lows such multi module systems to be trained globally using Gradient Based methods so as to minimize an overall per formance measure Two systems for on line handwriting recognition are de scribed Experiments demonstrate the advantage of global training and the exibility of Graph Transformer Networks A Graph Transformer Network for reading bank check is also described It uses Convolutional Neural Network char acter recognizers combined with global training techniques to provides record accuracy on business and personal checks It is deployed commercially and reads several million checks per day

[1]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[2]  S. Amari A Theory ofAdaptive Pattern Classifiers , 1967 .

[3]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[4]  I︠a︡. Z. T︠S︡ypkin,et al.  Foundations of the theory of learning systems , 1973 .

[5]  Kumpati S. Narendra,et al.  Adaptation and learning in automatic systems , 1974 .

[6]  Schurmann A Multifont Word Recognition System for Postal Address Reading , 1978, IEEE Transactions on Computers.

[7]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[8]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[9]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[12]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[13]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Yann LeCun PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[15]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987 .

[16]  W. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[17]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[18]  Teuvo Kohonen,et al.  Statistical pattern recognition with neural networks , 1988, Neural Networks.

[19]  Alberto L. Sangiovanni-Vincentelli,et al.  Efficient Parallel Learning Algorithms for Neural Networks , 1988, NIPS.

[20]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[21]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[22]  I. Guyon,et al.  Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[23]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[24]  Y. Le Cun,et al.  Comparing different neural network architectures for classifying handwritten digits , 1989, International 1989 Joint Conference on Neural Networks.

[25]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[26]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[27]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[28]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[29]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[30]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[31]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[32]  Françoise Fogelman-Soulié,et al.  Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping , 1990, Neural Networks.

[33]  James D. Keeler,et al.  Integrated Segmentation and Recognition of Hand-Printed Numerals , 1990, NIPS.

[34]  Ching Y. Suen,et al.  The State of the Art in Online Handwriting Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[36]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[37]  Alexander H. Waibel,et al.  Time-delay neural networks embedding time alignment: a performance analysis , 1991, EUROSPEECH.

[38]  Yann LeCun,et al.  Multi-Digit Recognition Using a Space Displacement Neural Network , 1991, NIPS.

[39]  Yoshua Bengio,et al.  Neural Network - Gaussian Mixture Hybrid for Speech Recognition or Density Estimation , 1991, NIPS.

[40]  Lawrence D. Jackel,et al.  An analog neural network processor with programmable topology , 1991 .

[41]  Isabelle Guyon,et al.  Design of a neural network character recognizer for a touch terminal , 1991, Pattern Recognit..

[42]  Michael C. Mozer,et al.  The perception of multiple objects , 1991 .

[43]  Y. Lee Handwritten digit recognition using k nearest neighbour radial-basis function, and backpropagation , 1991 .

[44]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[45]  Ching Y. Suen,et al.  Computer recognition of unconstrained handwritten numerals , 1992, Proc. IEEE.

[46]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[47]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[48]  Y. Le Cun,et al.  Shortest path segmentation: a method for training a neural network to recognize character strings , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[49]  Lawrence D. Jackel,et al.  Reading handwritten digits: a ZIP code recognition system , 1992, Computer.

[50]  Harris Drucker,et al.  Improving Performance in Neural Networks Using a Boosting Algorithm , 1992, NIPS.

[51]  Lawrence D. Jackel,et al.  Application of the ANNA neural network chip to high-speed character recognition , 1992, IEEE Trans. Neural Networks.

[52]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[53]  Isabelle Guyon,et al.  Recognition-Based Segmentation of On-Line Hand-Printed Words , 1992, NIPS.

[54]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[55]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[56]  Yann LeCun,et al.  Off Line Recognition of Handwritten Postal Words Using Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[57]  John C. Platt,et al.  Postal Address Block Location Using a Convolutional Locator Network , 1993, NIPS.

[58]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[59]  Patrick Haffner,et al.  Connectionist speech recognition with a global MMI algorithm , 1993, EUROSPEECH.

[60]  J. Wang,et al.  Multiresolution neural networks for omnifont character recognition , 1993, IEEE International Conference on Neural Networks.

[61]  Ulrich Bodenhausen,et al.  Connectionist architectural learning for high performance character and speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[63]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[64]  Yoshua Bengio,et al.  On-line handwriting recognition with neural networks: Spatial representation versus temporal representation , 1993 .

[65]  Gale L. Martin,et al.  Centered-Object Integrated Segmentation and Recognition of Overlapping Handprinted Characters , 1993, Neural Computation.

[66]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[67]  Ulrich Bodenhausen,et al.  A connectionist recognizer for on-line cursive handwriting recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[68]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[69]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[70]  Fernando Pereira,et al.  Weighted Rational Transductions and their Application to Human Language Processing , 1994, HLT.

[71]  Yoshua Bengio,et al.  Word normalization for on-line handwritten word recognition , 1994 .

[72]  Yann LeCun,et al.  Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[73]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[74]  Idiap a System for the Oo-line Recognition of Handwritten Text , 1994 .

[75]  Morgan Systems,et al.  Eecient Pattern Recognition Using a New Transformation Distance, In`advances in Neural Information Processing , 1995 .

[76]  Anton Gunzinger,et al.  Fast neural net simulation with a DSP processor array , 1995, IEEE Trans. Neural Networks.

[77]  Yoshua Bengio,et al.  LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.

[78]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[79]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[80]  Ching Y. Suen,et al.  Cursive script recognition applied to the processing of bank cheques , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[81]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[82]  Samy Bengio,et al.  An EM Algorithm for Asynchronous Input/Output Hidden Markov Models , 1996 .

[83]  Yoshua Bengio,et al.  Neural networks for speech and sequence recognition , 1996 .

[84]  Yoshua Bengio,et al.  Discriminative feature and model design for automatic speech recognition , 1997, EUROSPEECH.

[85]  Mehryar Mohri,et al.  A Rational Design for a Weighted Finite-State Transducer Library , 1997, Workshop on Implementing Automata.

[86]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[87]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[88]  Mehryar Mohri,et al.  Weighted determinization and minimization for large vocabulary speech recognition , 1997, EUROSPEECH.

[89]  Isabelle Guyon,et al.  OVERVIEW AND SYNTHESIS OF ON-LINE CURSIVE HANDWRITING RECOGNITION TECHNIQUES , 1997 .

[90]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[91]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[92]  David G. Stork,et al.  Pattern Classiication and Scene Analysis 2nd Ed. Part 1: Pattern Classiication , 1999 .

[93]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[94]  Kunihiko Fukushima,et al.  Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.