Feature Selection and Language Syntax in Text Recognition

249 There are many features that can be used to recognize images of text. The choice of a feature set is usually made intuitively to optimize performance in single character recognition. This approach to feature set selection does not utilize some of the evidence about human processing during reading that suggests feature extraction occurs in parallel with the development of an understanding of the text. Feature extraction in hum~ reading is a two-step process that can be framed as hypothesis generation anq 'testing. The understanding process includes syntactic as well as semantic components. This paper presents a set of algorithms for text recognition that model the essence of human reading with two feature extraction stages and an understanding phase that uses information about the syntactic context between words. An objective is to discover how different feature sets affect the perfonnance of syntax. Statistical experiments show that a simple representation for syntax reduces the number of words in a large lexicon that can match an input word by about 20 percent. Also, the error rate is reduced as the power of the feature detectors is increased.