Spatial recognition and grouping of text and graphics

We present a framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagrams. The approach is completely spatial, that is it does not require any ordering on the strokes. It also does not place any constraint on the relative placement of the shapes or symbols. Initially each of the strokes on the page is linked in a proximity graph. A discriminative classifier is used to classify connected subgraphs as either making up one of the known symbols or perhaps as an invalid combination of strokes (e.g. including strokes from two different symbols). This classifier combines the rendered image of the strokes with stroke features such as curvature and endpoints. A small subset of very efficient features is selected, yielding an extremely fast classifier. An A-star search algorithm over connected subsets of the proximity graph is used to simultaneously find the optimal segmentation and recognition of all the strokes on the page. Experiments demonstrate that the system can achieve 97% segmentation/recognition accuracy on a cross-validated shape dataset from 19 different writers.

[1]  Martin Kay,et al.  Algorithm schemata and data structures in syntactic processing , 1986 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Ching Y. Suen,et al.  The State of the Art in Online Handwriting Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[5]  Dean Rubine,et al.  Specifying gestures by example , 1991, SIGGRAPH.

[6]  Alan Conway,et al.  Page grammars and page parsing. A syntactic approach to document layout recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[7]  Mahesh Viswanathan,et al.  Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Robert M. Haralick,et al.  CD-ROM document database standard , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[9]  Yoshua Bengio,et al.  Word normalization for on-line handwritten word recognition , 1994 .

[10]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[11]  Mark D. Gross,et al.  Stretch-A-Sketch: a dynamic diagrammer , 1994, Proceedings of 1994 IEEE Symposium on Visual Languages.

[12]  Jonathan J. Hull Document Image Matching and Retrieval With Multiple Distortion-Invariant Descriptors , 1995 .

[13]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[14]  Hans Peter Graf,et al.  Analysis of complex and noisy check images , 1995, Proceedings., International Conference on Image Processing.

[15]  Hector Garcia-Molina,et al.  The SCAM Approach to Copy Detection in Digital Libraries , 1995, D Lib Mag..

[16]  Sargur N. Srihari,et al.  Knowledge-based derivation of document logical structure , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[17]  Hector Garcia-Molina,et al.  Copy detection mechanisms for digital documents , 1995, SIGMOD '95.

[18]  James A. Landay,et al.  Interactive sketching for the early stages of user interface design , 1995, CHI '95.

[19]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[20]  Mahesh Viswanathan,et al.  Document recognition: an attribute grammar approach , 1996, Electronic Imaging.

[21]  Jesse F Hull Recognition of mathematics using a two-dimensional trainable context-free grammar , 1996 .

[22]  Martin Kay,et al.  Chart Generation , 1996, ACL.

[23]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[24]  Jonathan J. Hull,et al.  Document image similarity and equivalence detection , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[25]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[26]  John F Cullen,et al.  Document Image Matching Techniques , 1997 .

[27]  Jonathan J. Hull,et al.  Document image database retrieval and browsing using texture analysis , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[28]  Manfred K. Lang,et al.  Online symbol segmentation and recognition in handwritten mathematical expressions , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[30]  Claire Cardie,et al.  Proposal for an Interactive Environment for Information Extraction , 1998 .

[31]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[32]  Paul A. Viola,et al.  Ambiguity and Constraint in Mathematical Expression Recognition , 1998, AAAI/IAAI.

[33]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[34]  David S. Doermann,et al.  The detection of duplicates in document image databases , 1998, Image Vis. Comput..

[35]  Eugene Charniak,et al.  Edge-Based Best-First Chart Parsing , 1998, VLC@COLING/ACL.

[36]  Jin Hyung Kim,et al.  Handwritten numeral string recognition with stroke grouping , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[37]  Nicholas E. Matsakis Recognition of Handwritten Mathematical Expressions , 1999 .

[38]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[39]  Loïc Pottier,et al.  On-line handwritten formula recognition using hidden Markov models and context dependent graph grammars , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[40]  Vinayak R. Borkar,et al.  Automatically Extracting Structure from Free Text Addresses , 2000, IEEE Data Eng. Bull..

[41]  Daniel X. Le,et al.  Automated labeling in document images , 2000, IS&T/SPIE Electronic Imaging.

[42]  Dit-Yan Yeung,et al.  Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[43]  Philip A. Chou,et al.  Turbo recognition: a statistical approach to layout analysis , 2000, IS&T/SPIE Electronic Imaging.

[44]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[45]  Masahiko Haruno,et al.  Text Categorization Using Transductive Boosting , 2001, ECML.

[46]  Ming Ye,et al.  Document image matching and annotation lifting , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[47]  Christine Alvarado,et al.  PRESERVING THE FREEDOM OF PAPER IN A COMPUTER-BASED SKETCH TOOL , 2001 .

[48]  James V. Mahoney,et al.  Interpreting Sloppy Stick Figures by Graph Rectification and Constraint-Based Matching , 2001, GREC.

[49]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[50]  Wan-Chi Siu,et al.  Document image template matching based on component block list , 2001, Pattern Recognit. Lett..

[51]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[52]  Yi Lu Murphey,et al.  Neural learning using AdaBoost , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[53]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[54]  Marcel Götze,et al.  The intelligent pen: toward a uniform treatment of electronic documents , 2002, SMARTGRAPH '02.

[55]  Richard Zanibbi,et al.  Applying compiler techniques to diagram recognition , 2002, Object recognition supported by user interaction for service robots.

[56]  TUNG-SHOU CHEN,et al.  A New Search Engine for Chinese Document Image Retrieval Based on Chinese Character Segmentation Features , 2002, Int. J. Comput. Process. Orient. Lang..

[57]  L. Kara,et al.  Recognizing Multi-Stroke Symbols , 2002 .

[58]  Laurent Denoue,et al.  Moving markup: repositioning freeform annotations , 2002, UIST '02.

[59]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[60]  Joaquim A. Jorge,et al.  CALI: An Online Scribble Recognizer for Calligraphic Interfaces , 2002 .

[61]  Richard Zanibbi,et al.  Recognizing Mathematical Expressions Using Tree Transformation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Thomas M. Breuel,et al.  High Performance Document Layout Analysis , 2003 .

[63]  Véronique Eglin,et al.  Document page similarity based on layout visual saliency: application to query by example and document classification , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[64]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[65]  Azriel Rosenfeld,et al.  Document structure analysis algorithms: a literature survey , 2003, IS&T/SPIE Electronic Imaging.

[66]  Rainer Lienhart,et al.  Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection , 2003, DAGM-Symposium.

[67]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[68]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[69]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[70]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[71]  Ramanujan S. Kashi,et al.  An architecture for ink annotations on Web documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[72]  Dennis P. Groth,et al.  A collaborative annotation system for data visualization , 2004, AVI.

[73]  Randall Davis,et al.  Perceptually based learning of shape descriptions for sketch recognition , 2004, AAAI.

[74]  Levent Burak Kara,et al.  Sim-U-Sketch: a sketch-based interface for SimuLink , 2004, AVI.

[75]  Zile Wei,et al.  Recognizing Freeform Digital Ink Annotations , 2004, Document Analysis Systems.

[76]  Richard Zanibbi,et al.  A survey of table recognition , 2004, Document Analysis and Recognition.

[77]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[78]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[79]  A. Richard Newton,et al.  Sketched symbol recognition using Zernike moments , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[80]  Henrique S. Malvar,et al.  An efficient binary image activity detector based on connected components , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[81]  Paul A. Viola,et al.  Recognition and grouping of handwritten text in diagrams and equations , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[82]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[83]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[84]  A. Newton,et al.  Sketched symbol recognition using Zernike moments , 2004, ICPR 2004.

[85]  Brad A. Myers,et al.  Citrine: providing intelligent copy-and-paste , 2004, UIST '04.

[86]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  David Bargeron,et al.  Boosting-based transductive learning for text detection , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).