Arithmetic coding revisited

During its long gestation in the 1970s and early 1980s, arithmetic coding was widely regarded more as an academic curiosity than a practical coding technique. One factor that helped it gain the popularity it enjoys today was the publication in 1987 of source code for a multi symbol arithmetic coder in Communications of the ACM. Now (1995), our understanding of arithmetic coding has further matured, and it is timely to review the components of that implementation and summarise the improvements that we and other authors have developed since then. We also describe a novel method for performing the underlying calculation needed for arithmetic coding. Accompanying the paper is a "Mark II" implementation that incorporates the improvements we suggest. The areas examined include: changes to the coding procedure that reduce the number of multiplications and divisions and permit them to be done to low precision; the increased range of probability approximations and alphabet sizes that can be supported using limited precision calculation; data structures for support of arithmetic coding on large alphabets; the interface between the modelling and coding subsystems; the use of enhanced models to allow high performance compression. For each of these areas, we consider how the new implementation differs from the CACM package.

[1]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[2]  Inder Jeet Taneja,et al.  Bounds on the redundancy of Huffman codes , 1986, IEEE Trans. Inf. Theory.

[3]  J. Jiang Novel design of arithmetic coding for data compression , 1995 .

[4]  Jeffrey Scott Vitter,et al.  Arithmetic coding for data compression , 1994 .

[5]  Douglas W. Jones,et al.  Application of splay trees to data compression , 1988, CACM.

[6]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[7]  Alfredo De Santis,et al.  Tight upper bounds on the redundancy of Huffman codes , 1989, IEEE Trans. Inf. Theory.

[8]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[9]  Robert E. Tarjan,et al.  A locally adaptive data compression scheme , 1986, CACM.

[10]  Alistair Moffat,et al.  Linear time adaptive arithmetic coding , 1990, IEEE Trans. Inf. Theory.

[11]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman codes , 1987, JACM.

[12]  Ian H. Witten,et al.  An Empirical Evaluation of Coding Methods for Multi-symbol Alphabets , 1994, Inf. Process. Manag..

[13]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[14]  David C. van Voorhis,et al.  Optimal source codes for geometrically distributed integer alphabets (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[15]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[16]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[17]  S. Golomb Run-length encodings. , 1966 .

[18]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[19]  Peter M. Fenwick A New Data Structure for Cumulative Probability Tables: An Improved Frequency‐to‐Symbol Algorithm , 1996 .

[20]  Donald E. Knuth,et al.  Dynamic Huffman Coding , 1985, J. Algorithms.

[21]  Alistair Moffat,et al.  Word‐based text compression , 1989, Softw. Pract. Exp..

[22]  Robert G. Gallager,et al.  Variations on a theme by Huffman , 1978, IEEE Trans. Inf. Theory.

[23]  Jeffrey Scott Vitter,et al.  Analysis of arithmetic coding for data compression , 1991, [1991] Proceedings. Data Compression Conference.

[24]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[25]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[26]  Peter M. Fenwick,et al.  A new data structure for cumulative frequency tables , 1994, Softw. Pract. Exp..

[27]  P. Glenn Gulak,et al.  Minimizing Excess Code Length and VLSI Complexity in the Multiplication Free Approximation of Arithmetic Coding , 1994, Inf. Process. Manag..

[28]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[29]  R. Nigel Horspool,et al.  Algorithms for Adaptive Huffman Codes , 1984, Inf. Process. Lett..

[30]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[31]  Ian H. Witten,et al.  An empirical evaluation of coding methods for multi-symbol alphabets , 1993, [Proceedings] DCC `93: Data Compression Conference.

[32]  Glen G. Langdon,et al.  An Introduction to Arithmetic Coding , 1984, IBM J. Res. Dev..

[33]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[34]  Frank Rubin,et al.  Arithmetic stream coding using fixed precision registers , 1979, IEEE Trans. Inf. Theory.

[35]  Jorma Rissanen,et al.  A multiplication-free multialphabet arithmetic code , 1989, IEEE Trans. Commun..

[36]  R. Nigel Horspool,et al.  Constructing word-based text compression algorithms , 1992, Data Compression Conference, 1992..

[37]  Ehud D. Karnin,et al.  High efficiency, multiplication free approximation of arithmetic coding , 1991, [1991] Proceedings. Data Compression Conference.

[38]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[39]  Glen G. Langdon,et al.  Arithmetic Coding , 1979, IBM J. Res. Dev..

[40]  R. Nigel Horspool,et al.  Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..