Text recognition enhancement with a probabilistic lattice chart parser

A probabilistic lattice chart parser is proposed for improving the performance of a text recognition technique. Digital images of words are recognized and alternatives for the identity of each are generated. Local word collocation statistics and a probabilistic chart parsing algorithm are used to determine the top N best parses for each sentence using the alternatives provided for the identity of each word by the recognition system. An approach in which text recognition and understanding are tightly integrated is discussed. An objective of this approach is to provide the capacity to process images of unrestricted English text. A large-scale lexicon, which supports the system, was acquired by training on corpora of over 3,000,000 words. The focus is on the implementation and performance of the probabilistic lattice chart parser.<<ETX>>

[1]  Sargur N. Srihari,et al.  Word Recognition With Multi-Level Contextual Knowledge , 1991 .

[2]  John Cocke,et al.  Probabilistic Parsing Method for Sentence Disambiguation , 1989, IWPT.

[3]  M. Tomita,et al.  An efficient word lattice parsing algorithm for continuous speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[5]  David M. Magerman,et al.  Efficiency, Robustness and Accuracy in Picky Chart Parsing , 1992, ACL.

[6]  Ken Thompson,et al.  Reading Chess , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[8]  Jonathan J. Hull Incorporation of a Markov model of language syntax in a text recognition algorithm , 1995 .

[9]  Tin Kam Ho,et al.  World image matching as a technique for degraded text recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[10]  Sargur N. Srihari,et al.  A word shape analysis approach to lexicon based word recognition , 1992, Pattern Recognit. Lett..

[11]  Tao Hong,et al.  Degraded text recognition using word collocation , 1994, Electronic Imaging.

[12]  Martin Kay,et al.  Algorithm schemata and data structures in syntactic processing , 1986 .

[13]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..