Document analysis techniques for the infinite memory multifunction machine

A system that saves a digital copy of every document that users copy, print, or fax, without asking the user, has recently been proposed. Referred to as the Infinite Memory Multifunction Machine (IM/sup 3/), this system solves most of the problem of lost documents. However, because of the indiscriminate way it captures data, it is important that users have easy-to-use retrieval tools. Two document analysis techniques are described that simplify retrieval from large collections like the IM/sup 3/. One technique detects duplicates or versions of a document. Another method automatically files a document in a hierarchy familiar to a user. Experimental results are presented that illustrate the performance of each method.

[1]  Robert M. Haralick,et al.  CD-ROM document database standard , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[2]  Jonathan J. Hull,et al.  Information Extraction from Symbolically Compressed Document Images , 1999 .

[3]  Georges Hébrail,et al.  Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together , 1992, SIGIR '92.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Faouzi Kossentini,et al.  The emerging JBIG2 standard , 1998, IEEE Trans. Circuits Syst. Video Technol..

[6]  Jonathan J. Hull,et al.  Document image database retrieval and browsing using texture analysis , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[7]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[8]  Rainer Hoch,et al.  Using IR techniques for text classification in document analysis , 1994, SIGIR '94.

[9]  George Nagy,et al.  An Autonomous Reading Machine , 1968, IEEE Transactions on Computers.

[10]  Jonathan J. Hull,et al.  THE INFINITE MEMORY MULTIFUNCTION MACHINE (IM 3 ) , 1998 .