Progress in camera-based document image analysis

The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick books, historical pages too fragile to touch, and text in scenes; and they are much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there is clearly a demand from many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.

[1]  Alex Waibel,et al.  An automatic sign recognition and translation system , 2001, PUI '01.

[2]  H. Kamada,et al.  High-speed, high-accuracy binarization method for recognizing text in images of low spatial resolutions , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[3]  Majid Mirmehdi,et al.  Extracting Low Resolution Text with an Active Camera for OCR , 2001 .

[4]  Andrew H. Gee,et al.  Document mosaicing , 1997, Image Vis. Comput..

[5]  Pierre David Wellner,et al.  Interacting with paper on the DigitalDesk , 1993, CACM.

[6]  Andrea Miene,et al.  Extracting textual inserts from digital videos , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Majid Mirmehdi,et al.  On the Recovery of Oriented Documents from Single Images , 2002 .

[8]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[9]  Christopher R. Dance,et al.  Binarising camera images for OCR , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[10]  Michael J. Taylor,et al.  Enhancement of document images from cameras , 1998, Electronic Imaging.

[11]  M. V. Ranganath,et al.  Real time image enhancement for both text and color photo images , 1995, Proceedings., International Conference on Image Processing.

[12]  JungHyun Han,et al.  Text scanner with text detection technology on image sequences , 2002, Object recognition supported by user interaction for service robots.

[13]  Ullas Gargi,et al.  Indexing text events in digital video databases , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[14]  Shoji Kurakake,et al.  Recognition and visual feature matching of text region in video for conceptual indexing , 1997, Electronic Imaging.

[15]  Robert M. Gray,et al.  Text and picture segmentation by the distribution analysis of wavelet coefficients , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[16]  Carsten Rother,et al.  A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[17]  C. Luchini,et al.  [High speed]. , 1969, Revista De La Escuela De Odontologia, Universidad Nacional De Tucuman, Facultad De Medicina.

[18]  David S. Doermann,et al.  Text enhancement in digital video using multiple frame integration , 1999, MULTIMEDIA '99.

[19]  Yasuhiko Watanabe,et al.  Translation camera , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[20]  Majid Mirmehdi,et al.  Finding Text Regions Using Localised Measures , 2000 .

[21]  Hae-Kwang Kim,et al.  Efficient Automatic Text Location Method and Content-Based Indexing and Structuring of Video Database , 1996, J. Vis. Commun. Image Represent..

[22]  Joseph Kittler,et al.  Towards Optimal Zoom for Automatic target Recognition , 1997 .

[23]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[24]  Alex S. Taylor,et al.  CamWorks: a video-based tool for efficient capture from paper source documents , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[25]  Majid Mirmehdi,et al.  Recognising text in real scenes , 2002, International Journal on Document Analysis and Recognition.

[26]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[27]  David S. Doermann,et al.  A video text detection system based on automated training , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[28]  Jean-Michel Jolion,et al.  Text localization, enhancement and binarization in multimedia documents , 2002, Object recognition supported by user interaction for service robots.

[29]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[30]  Ying Zhang,et al.  Towards Automatic Sign Translation , 2001, HLT.

[31]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[32]  David Doermann,et al.  Text enhancement in digital video , 1999, Electronic Imaging.

[33]  Daniel P. Lopresti,et al.  Extracting text from WWW images , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[34]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[35]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[37]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[38]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[39]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Majid Mirmehdi,et al.  Estimating the Orientation and Recovery of Text Planes in a Single Image , 2001, BMVC.

[42]  Larry S. Davis,et al.  A video based interface to textual information for the visually impaired , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[43]  Atreyi Kankanhalli,et al.  Automatic Extraction of Characters in Complex Scene Images , 1995, Int. J. Pattern Recognit. Artif. Intell..

[44]  Ellen K. Hughes,et al.  Video OCR for Digital News Archives , 1998 .

[45]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[46]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[47]  Shih-Fu Chang,et al.  General and domain-specific techniques for detecting and recognizing superimposed text in video , 2002, Proceedings. International Conference on Image Processing.

[48]  Robert M. Haralick,et al.  Document image understanding: geometric and logical layout , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[49]  E. Y. Du,et al.  Thresholding video images for text detection , 2002, Object recognition supported by user interaction for service robots.

[50]  Robert C. Bolles,et al.  RECOGNITION OF TEXT IN 3-D SCENES , 2001 .

[51]  Stefano Messelodi,et al.  Automatic identification and skew estimation of text lines in real scene images , 1999, Pattern Recognition.

[52]  Proceedings Seventh International Conference on Document Analysis and Recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[53]  Wei W. Cindy Jiang Thresholding and enhancement of text images for character recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[54]  Majid Mirmehdi,et al.  Location and recovery of text on oriented surfaces , 1999, Electronic Imaging.

[55]  Rainer Lienhart,et al.  Automatic text recognition in digital videos , 1995, Electronic Imaging.

[56]  David S. Doermann,et al.  Binarization of low quality text using a Markov random field model , 2002, Object recognition supported by user interaction for service robots.

[57]  Minoru Mori,et al.  Telop-on-demand: video structuring and retrieval based on text recognition , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[58]  Lina J. Karam,et al.  Morphological text extraction from images , 2000, IEEE Trans. Image Process..

[59]  Øivind Due Trier,et al.  Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..