Color documents on the Web with DjVu

We present a new image compression technique called "DjVu" that is specifically geared towards the compression of scanned documents in color at high resolution. With DjVu, a magazine page in color at 300 dpi typically occupies between 40 KB and 80 KB, approximately 5 to 10 times better than JPEG for a similar level of readability. Using a combination of hidden Markov model techniques and MDL-driven heuristics, DjVu first classifies each pixel in the image as either foreground (text, drawings) or background (pictures, photos, paper texture). The pixel categories form a bitonal image which is compressed using a pattern matching technique that takes advantage of the similarities between character shapes. A progressive, wavelet-based compression technique, combined with a masking algorithm, is then used to compress the foreground and background images at lower resolutions while minimizing the number of bits spent on the pixels that are not visible in the foreground and background planes. Encoders, decoders, and real-time, memory efficient plug-ins for various web browsers are available for all the major platforms.

[1]  George Nagy,et al.  A Means for Achieving a High Degree of Compaction on Scan-Digitized Printed Text , 1974, IEEE Transactions on Computers.

[2]  Wayne Niblack,et al.  Unsupervised image segmentation using the minimum description length principle , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[3]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[4]  Paul G. Howard,et al.  Text Image Compression Using Soft Pattern Matching , 1997, Comput. J..

[5]  Yoshua Bengio,et al.  High quality document image compression with "DjVu" , 1998, J. Electronic Imaging.

[6]  Yoshua Bengio,et al.  The Z-coder adaptive binary coder , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[7]  Steven Pigeon,et al.  Lossy compression of partially masked still images , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[8]  Stuart Inglis Lossless Document Image Compression , 1999 .