Order Statistic Filter (OSF): A Novel Approach to Document Analysis

Page segmentation is one of the important and basic research subjects of document analysis. There are two major kinds of page segmentation methods, i.e. hierarchical and no-hierarchical ones. Most traditional techniques such as top–down and bottom–up approaches belong to the hierarchical method. Though these two approaches have been used till now, they are not effective for processing documents with high geometric complexity and the process of splitting document needs iterative operations which is time consuming. A non-hierarchical method called the modified fractal signature (MFS) was presented in recent years. It can overcome the above weaknesses, however the MFS needs to calculate modified fractal signature which makes the theory very complex. In this thesis, we present a new page segmentation approach: Median Order Statistic Filter (MedOSF) — Maximum Order Statistic Filter (MaxOSF) approach which is more direct and much simpler. We use the MedOSF to remove the salt–pepper noise of the document and use the MaxOSF to do the page segmentation. In practice, they not only can adaptively process the documents with high geometrical complexity, but also save a lot of computing time.

[1]  Yuan Yan Tang,et al.  Modified Fractal Signature (MFS): A New Approach to Document Analysis for Automatic Knowledge Acquisition , 1997, IEEE Trans. Knowl. Data Eng..

[2]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Dan Schonfeld,et al.  On the relation of order-statistics filters and template matching: optimal morphological pattern recognition , 2000, IEEE Trans. Image Process..

[4]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..