High quality document image compression with "DjVu"

We present a new image compression technique called \DjVu " that is speci cally geared towards the compression of high-resolution, high-quality images of scanned documents in color. This enables fast transmission of document images over low-speed connections, while faithfully reproducing the visual aspect of the document, including color, fonts, pictures, and paper texture. The DjVu compressor separates the text and drawings, which needs a high spatial resolution, from the pictures and backgrounds, which are smoother and can be coded at a lower spatial resolution. Then, several novel techniques are used to maximize the compression ratio: the bi-level foreground image is encoded with AT&T's proposal to the new JBIG2 fax standard, and a new wavelet-based compression method is used for the backgrounds and pictures. Both techniques use a new adaptive binary arithmetic coder called the Z-coder. A typical magazine page in color at 300dpi can be compressed down to between 40 to 60 KB, approximately 5 to 10 times better than JPEG for a similar level of subjective quality. A real-time, memory e cient version of the decoder was implemented, and is available as a plug-in for popular web browsers.

[1]  S. Golomb Run-length encodings. , 1966 .

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  George Nagy,et al.  A Means for Achieving a High Degree of Compaction on Scan-Digitized Printed Text , 1974, IEEE Transactions on Computers.

[4]  M. Sezan,et al.  Image Restoration by the Method of Convex Projections: Part 2-Applications and Numerical Results , 1982, IEEE Transactions on Medical Imaging.

[5]  D. Youla,et al.  Image Restoration by the Method of Convex Projections: Part 1ߞTheory , 1982, IEEE Transactions on Medical Imaging.

[6]  K. Mohiuddin,et al.  Lossless Binary Image Compression Based on Pattern Matching , 1984 .

[7]  Costas Xydeas,et al.  Recent developments in image data compression for digital facsimile , 1986 .

[8]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[9]  Ian H. Witten,et al.  Textual image compression , 1992, Data Compression Conference, 1992..

[10]  Lawrence O'Gorman,et al.  The RightPages image-based electronic library for alerting and browsing , 1992, Computer.

[11]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..

[12]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[13]  W. Sweldens The Lifting Scheme: A Custom - Design Construction of Biorthogonal Wavelets "Industrial Mathematics , 1996 .

[14]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[15]  William A. Pearlman,et al.  A new, fast, and efficient image codec based on set partitioning in hierarchical trees , 1996, IEEE Trans. Circuits Syst. Video Technol..

[16]  Yoshua Bengio,et al.  Reading checks with multilayer graph transformer networks , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[18]  Paul G. Howard,et al.  Text Image Compression Using Soft Pattern Matching , 1997, Comput. J..

[19]  Michael E. Lesk,et al.  Practical Digital Libraries: Books, Bytes, and Bucks , 1997 .

[20]  Yoshua Bengio,et al.  The Z-coder adaptive binary coder , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[21]  A. Said,et al.  Manuscript Submitted to the Ieee Transactions on Circuits and Systems for Video Technology a New Fast and Eecient Image Codec Based on Set Partitioning in Hierarchical Trees , 2007 .