Transcript mapping for historic handwritten document images

There is a large number of scanned historical documents that need to be indexed for archival and retrieval purposes. A visual word spotting scheme that would serve these purposes is a challenging task even when the transcription of the document image is available. We propose a framework for mapping each word in the transcript to the associated word image in the document. Coarse word mapping based on document constraints is used for lexicon reduction. Then, word mappings are refined using word recognition results by a dynamic programming algorithm that finds the best match while satisfying the constraints.

[1]  Giovanni Seni,et al.  External word segmentation of off-line handwritten text lines , 1994, Pattern Recognit..

[2]  S. N Srihari,et al.  Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition , 2002 .

[3]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Rodney M. Goodman,et al.  Keyword spotting for cursive document retrieval , 1997, Proceedings Workshop on Document Image Analysis (DIA'97).

[5]  Uma Mahadevan,et al.  Gap metrics for word separation in handwritten lines , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Gyeonghwan Kim,et al.  A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Yan Solihin,et al.  Integral Ratio: A New Class of Global Thresholding Techniques for Handwriting Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..