Learning to Group Text Lines and Regions in Freeform Handwritten Notes

This paper proposes a machine learning approach to grouping problems in ink parsing. Starting from an initial segmentation, hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features. This framework has successfully applied to grouping text lines and regions in complex freeform digital ink notes from real TabletPC users. It holds great potential in solving many other grouping problems in the ink parsing and document image analysis domains.

[1]  Guozhong Dai,et al.  Structuralizing digital ink for efficient selection , 2006, IUI '06.

[2]  Paul A. Viola,et al.  Recognition and grouping of handwritten text in diagrams and equations , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[3]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Xin Wang,et al.  Parsing ink annotations on heterogeneous documents , 2006, SBM'06.

[5]  David Jones,et al.  Discerning structure from freeform handwritten notes , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[6]  Thierry Artières,et al.  On-line handwritten documents segmentation , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[7]  Azriel Rosenfeld,et al.  Document structure analysis algorithms: a literature survey , 2003, IS&T/SPIE Electronic Imaging.

[8]  Yuan Qi,et al.  Contextual recognition of hand-drawn diagrams with conditional random fields , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[9]  Kevin Laven,et al.  A statistical learning approach to document image analysis , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).