Image Segmentation and Recognition

We have constructed a system for recognizing multi-character images 1. This is a nontrivial extension of our previous work on single-character images. It is somewhat surprising that a very good single-character recognizer does not in general form a good basis for a multi-character recognizer. The correct solution depends on three key ideas: 1) A method for normalizing probabilities correctly, to preserve information on the quality of the segmentation; 2) A method for giving credit for multiple segmentations that assign the same interpretation to the image; and 3) A method that combines recognition and segmentation into a single adaptive process, trained to maximize the score of the right answer. We also discuss improved ways of analyzing recognizer performance. A major part of this technical report is devoted to giving our methods a good theoretical footing. In particular, we do not start by asserting that maximum likelihood is obviously the right thing to do. Instead, the problem is formalized in terms of a probability measure; the learning algorithm must then be arranged to make this probability conform to the customer’s needs. This formulation can be applied to other segmentation problems such as speech recognition. Our recognizer using these principles works noticeably better than the previous state of the art. This work also appeared, with the same title and authors, in The Mathematics of Generalization: Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning, Addison

[1]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[2]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[3]  S.E. Levinson,et al.  Structural methods in automatic speech recognition , 1985, Proceedings of the IEEE.

[4]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[5]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[6]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Yann LeCun,et al.  Multi-Digit Recognition Using a Space Displacement Neural Network , 1991, NIPS.

[11]  Patrick Gallinari,et al.  Learning vector quantization, multi layer perceptron and dynamic programming: comparison and cooperation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[12]  Y. Le Cun,et al.  Shortest path segmentation: a method for training a neural network to recognize character strings , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[13]  Lawrence D. Jackel,et al.  Reading handwritten digits: a ZIP code recognition system , 1992, Computer.

[14]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.