Generation and Use of Synthetic Training Data in Cursive Handwriting Recognition

Three different methods for the synthetic generation of handwritten text are introduced. These methods are experimentally evaluated in the context of a cursive handwriting recognition task, using an HMM-based recognizer. In the experiments, the performance of a traditional recognizer, which is trained on data produced by human writers, is compared to a system that is trained on synthetic data only. Under the most elaborate synthetic handwriting generation model, a level of performance comparable to, or even slightly better than, the system trained on the writing of humans was observed.

[1]  Horst Bunke,et al.  Off-Line, Handwritten Numeral Recognition by Perturbation Method , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Karl Sims,et al.  Handwritten Character Classification Using Nearest Neighbor in Large Databases , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Horst Bunke,et al.  Automatic segmentation of the IAM off-line database for handwritten English text , 2002, Object recognition supported by user interaction for service robots.

[4]  John Bennett,et al.  The effect of large training set sizes on online Japanese Kanji and English cursive recognizers , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[5]  Volker Märgner,et al.  Synthetic data for Arabic OCR system development , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6]  Horst Bunke,et al.  Automatic bankcheck processing , 1997 .

[7]  Horst Bunke,et al.  Text line segmentation and word recognition in a system for general writer independent handwriting recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[8]  Venu Govindaraju,et al.  Generating manifold samples from a handwritten word , 1994, Pattern Recognit. Lett..

[9]  Henry S. Baird,et al.  Document image defect models , 1995 .

[10]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[11]  Torsten Caesar,et al.  Sophisticated topology of hidden Markov models for cursive script recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Harry Shum,et al.  Learning-based cursive handwriting synthesis , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Réjean Plamondon,et al.  The generation of handwriting with delta-lognormal synergies , 1998, Biological Cybernetics.

[15]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Johansson. Stig,et al.  Manual of information to accompany the Lancaster-Oslo : Bergen Corpus of British English, for use with digital computers , 1978 .

[17]  Rafael Llobet,et al.  Training Set Expansion in Handwritten Character Recognition , 2002, SSPR/SPR.

[18]  Kazuhiko Yamamoto,et al.  Structured Document Image Analysis , 1992, Springer Berlin Heidelberg.

[19]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..