Learning to Group Text Lines and Regions in Freeform Handwritten Notes

This paper proposes a machine learning approach to grouping problems in ink parsing. Starting from an initial segmentation, hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features. This framework has successfully applied to grouping text lines and regions in complex freeform digital ink notes from real TabletPC users. It holds great potential in solving many other grouping problems in the ink parsing and document image analysis domains.

[1]  David Jones,et al.  Discerning structure from freeform handwritten notes , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Anil K. Jain,et al.  Structure in on-line documents , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[3]  Kevin Laven,et al.  A statistical learning approach to document image analysis , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[4]  Guozhong Dai,et al.  Structuralizing digital ink for efficient selection , 2006, IUI '06.

[5]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Xin Wang,et al.  Parsing ink annotations on heterogeneous documents , 2006, SBM'06.

[7]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[8]  Thierry Artières,et al.  On-line handwritten documents segmentation , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[9]  Yuan Qi,et al.  Contextual recognition of hand-drawn diagrams with conditional random fields , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.