DjVu: analyzing and compressing scanned documents for Internet distribution

DjVu is an image compression technique specifically geared towards the compression of scanned documents in color at high resolution. Typical color magazine pages scanned at 300 dpi are compressed to between 40 and 80 kBytes, or 5 to 10 times smaller than with JPEG for a similar level of subjective quality. The foreground layer, which contains the text and drawings and requires high spatial resolution, is separated from the background layer, which contains pictures and backgrounds and requires less resolution. The foreground is compressed with a bi-tonal image compression technique that takes advantage of character shape similarities. The background is compressed with a new progressive, wavelet-based compression method. A real-time, memory-efficient version of the decoder is available as a plug-in for popular Web browsers.

[1]  George Nagy,et al.  A Means for Achieving a High Degree of Compaction on Scan-Digitized Printed Text , 1974, IEEE Transactions on Computers.

[2]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[3]  Wayne Niblack,et al.  Unsupervised image segmentation using the minimum description length principle , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[4]  Lawrence O'Gorman,et al.  The RightPages image-based electronic library for alerting and browsing , 1992, Computer.

[5]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[6]  Paul G. Howard,et al.  Text Image Compression Using Soft Pattern Matching , 1997, Comput. J..

[7]  Yoshua Bengio,et al.  High quality document image compression with "DjVu" , 1998, J. Electronic Imaging.

[8]  Yoshua Bengio,et al.  The Z-coder adaptive binary coder , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[9]  Steven Pigeon,et al.  Lossy compression of partially masked still images , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).