A Survey of Methods and Strategies in Character Segmentation

Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation. This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. Segmentation methods are listed under four main headings. What may be termed the "classical" approach consists of methods that partition the input image into subimages, which are then classified. The operation of attempting to decompose the image into classifiable units is called "dissection." The second class of methods avoids dissection, and segments the image either explicitly, by classification of prespecified windows, or implicitly by classification of subsets of spatial features collected from the image as a whole. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages. Finally, holistic approaches that avoid segmentation by recognizing entire character strings as units are described.

[1]  Berrin A. Yanikoglu,et al.  Recognizing off-line cursive handwriting , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[2]  B. Chaudhuri,et al.  A procedure for recognition of connected handwritten numerals , 1982 .

[3]  Mindy Bokser,et al.  Omnidocument technologies , 1992, Proc. IEEE.

[4]  Jin Wang,et al.  Segmentation of merged characters by neural networks and shortest-path , 1993, SAC '93.

[5]  Mohamed Cheriet,et al.  Background region-based algorithm for the segmentation of connected digits , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[6]  Eric Lecolinet,et al.  A multi-classifier combination strategy for the recognition of handwritten cursive words , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[7]  Malayappan Shridhar,et al.  Recognition of isolated and simply connected handwritten numerals , 1986, Pattern Recognition.

[8]  Sargur N. Srihari,et al.  A String Correction Algorithm for Cursive Script Recognition , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  S. Datta,et al.  Off-line cursive-script recognition using a neural network , 1991 .

[10]  Yi Lu On the segmentation of touching characters , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[11]  Mohamed Cheriet Reading cursive script by parts , 1993 .

[12]  Chien-Huei Chen,et al.  Word recognition in a segmentation-free approach to OCR , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[13]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[14]  Patrick S. P. Wang,et al.  Character segmentation techniques for handwritten text-a survey , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[15]  James Westall,et al.  Vertex directed segmentation of handwritten numerals , 1993, Pattern Recognit..

[16]  V. A. Kovalevsky,et al.  Character readers and pattern recognition , 1968 .

[17]  Olivier Baret,et al.  Cursive Word Recognition: Methods and Strategies , 1994 .

[18]  Roy L. Hoffman,et al.  Segmentation Methods for Recognition of Machine-Printed Characters , 1971, IBM J. Res. Dev..

[19]  S. Srihari,et al.  A Word Shape Analysis Approach to Recognition of Degraded Word Images , 1990 .

[20]  Thomas M. Breuel Design and Implementation of a System for the Recognition of Handwritten Responses on US Census Forms , 1994 .

[21]  Sargur N. Srihari,et al.  A word shape analysis approach to lexicon based word recognition , 1992, Pattern Recognit. Lett..

[22]  Paramvir Bahl,et al.  Recognition of handwritten word: First and second order hidden Markov model based approach , 1989, Pattern Recognit..

[23]  Paramvir Bahl,et al.  Recognition of handwritten word: first and second order hidden Markov model based approach , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Dave Elliman,et al.  A review of segmentation and contextual analysis techniques for text recognition , 1990, Pattern Recognit..

[25]  I. Taylor,et al.  The Psychology of Reading , 1983 .

[26]  Kunihiko Fukushima,et al.  Recognition and segmentation of connected characters with selective attention , 1993, Neural Networks.

[27]  Ching Y. Suen,et al.  The State of the Art in Online Handwriting Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Jonathan Hull,et al.  COMPUTATIONAL APPROACH TO VISUAL WORD RECOGNITION: HYPOTHESIS GENERATION AND TESTING. , 1986 .

[29]  Charles C. Tappert,et al.  Cursive Script Recognition by Elastic Matching , 1982, IBM J. Res. Dev..

[30]  Paul D. Gader,et al.  Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Haruo Asada,et al.  Major components of a complete text reading system , 1992 .

[32]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[33]  Kin Hong Wong,et al.  Script recognition using hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Giuseppe Pirlo,et al.  From character to cursive script recognition: future trends in scientific research , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[35]  I. Kaneko,et al.  Character segmentation of address reading/letter sorting machine for the ministry of posts and telecommunications of Japan , 1993 .

[36]  R. J. Evey,et al.  Use of a computer to design character recognition logic , 1899, IRE-AIEE-ACM '59 (Eastern).

[37]  Theo Pavlidis,et al.  New method for word recognition without segmentation , 1993, Electronic Imaging.

[38]  Michel Gilloux Hidden Markov Models in Handwriting Recognition , 1994 .

[39]  Roger W. Ehrich,et al.  Experiments in the Contextual Recognition of Cursive Script , 1975, IEEE Transactions on Computers.

[40]  J.-C. Simon,et al.  Off-line cursive word recognition , 1992, Proc. IEEE.

[41]  Rajjan Shinghal,et al.  An Algorithm for Segmenting Handwritten Postal Codes , 1990, Int. J. Man Mach. Stud..

[42]  Gilles F. Houle,et al.  Hybrid Contextural Text Recognition with String Matching , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Kenneth M. Sayre,et al.  Machine recognition of handwritten words: A project report , 1973, Pattern Recognit..

[44]  Jin Wang,et al.  Segmentation of merged characters by neural networks and shortest path , 1994, Pattern Recognit..

[45]  R. B. Hennis The IBM 1975 optical page reader: part I: system design , 1968 .

[46]  Eric Lecolinet Segmentation d'images de mots manuscrits : application a la lecture de chaines de caracteres majuscules alphanumeriques et a la lecture de l'ecriture cursive , 1990 .

[47]  Tin Kam Ho,et al.  World image matching as a technique for degraded text recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[48]  Anthony J. Robinson,et al.  An Off-Line Cursive Handwriting Recognition System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  RAOUF F. H. FARAG,et al.  Word-Level Recognition of Cursive Script , 1979, IEEE Transactions on Computers.

[50]  Stephen Grossberg,et al.  Recognition and segmentation of connected characters with selective attention , 1994, Neural Networks.

[51]  Frank P. Kuhl,et al.  Classification and recognition of hand-printed characters , 1963 .

[52]  L. D. Earnest,et al.  Machine Recognition of Cursive Writing , 1962, IFIP Congress.

[53]  Kenneth C Hayes,et al.  Reading handwritten words using hierarchical relaxation , 1980 .

[54]  Majid Ahmadi,et al.  Segmentation of touching characters in printed document recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[55]  L. D. Harmon,et al.  Automatic recognition of print and script , 1972 .

[56]  Giovanni Seni,et al.  External word segmentation of off-line handwritten text lines , 1994, Pattern Recognit..

[57]  N. D. Gorsky Off-line Recognition of Bad Quality Handwritten Words Using Prototypes , 1994 .

[58]  Wilbur H. Highleyman,et al.  Data for Character Recognition Studies , 1963, IEEE Transactions on Electronic Computers.

[59]  Ulrich Kressel,et al.  Segmenting merged characters , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[60]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..