Combining deformable models and neural networks for handprinted digit recognition

In this thesis I develop a method for recognizing isolated handprinted digits using trainable deformable models. Each digit is modelled by a cubic B-spline whose basic shape is defined by the "home" positions of the control points. A Gaussian distribution over displacements of the control points away from their home locations defines a probability distribution over shapes. The quality of the match of a spline model to an image is calculated as the likelihood of the data under a mixture of Gaussian "ink generators" placed along the length of the spline. Each spline model is adjusted to minimize an energy function that includes both the deformation energy of the model and the likelihood of the data, using a elastic matching procedure which is a generalization of the Expectation Maximization (EM) algorithm. I show that the matching procedure can be significantly speeded up by using a neural net to provide better starting points for the search. The use of deformable models has a number of advantages. (1) After identifying the model most likely to have generated the data, the system not only produces a classification of the digit but also a rich description of the instantiation parameters. I have shown that these can be used to detect writing style consistency within a string of digits. (2) During the process of explaining the image, generative models can perform recognition-driven segmentation. (3) Unlike many other recognition schemes the method does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. The main disadvantage of the method is it requires much more computation than more standard optical character recognition techniques.