Parallel algorithms for Burrows-Wheeler compression and decompression

We present work-optimal PRAM algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet. For a string of length n, the depth of the compression algorithm is O(log^2n), and the depth of the corresponding decompression algorithm is O(logn). These appear to be the first polylogarithmic-time work-optimal parallel algorithms for any standard lossless compression scheme. The algorithms for the individual stages of compression and decompression may also be of independent interest: (1) a novel O(logn)-time, O(n)-work PRAM algorithm for Huffman decoding; (2) original insights into the stages of the BW compression and decompression problems, bringing out parallelism that was not readily apparent, allowing them to be mapped to elementary parallel routines that have O(logn)-time, O(n)-work solutions, such as: (i) prefix-sums problems with an appropriately-defined associative binary operator for several stages, and (ii) list ranking for the final stage of decompression. Follow-up empirical work suggests potential for considerable practical speedups on a PRAM-driven many-core architecture, against a backdrop of negative contemporary results on common commercial platforms.

[1]  Sergio De Agostino,et al.  Lempel-Ziv Data Compression on Parallel and Distributed Systems , 2011, CCP.

[2]  Richard Cole,et al.  Deterministic Coin Tossing with Applications to Optimal Parallel List Ranking , 2018, Inf. Control..

[3]  Kunihiko Sadakane,et al.  A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[4]  Axel Eirola Lossless data compression on GPGPU architectures , 2011, ArXiv.

[5]  Shmuel Tomi Klein,et al.  Parallel Huffman Decoding with Applications to JPEG Files , 2003, Comput. J..

[6]  Uzi Vishkin,et al.  Brief announcement: speedups for parallel graph triconnectivity , 2012, SPAA '12.

[7]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..

[8]  Uzi Vishkin,et al.  Empirical Speedup Study of Truly Parallel Data Compression , 2013 .

[9]  S. Muthukrishnan,et al.  Optimal parallel dictionary matching and compression (extended abstract) , 1995, SPAA '95.

[10]  Julian Seward On the performance of BWT sorting algorithms , 2000, Proceedings DCC 2000. Data Compression Conference.

[11]  Meng He,et al.  Indexing Compressed Text , 2003 .

[12]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[13]  Ramesh Hariharan,et al.  Optimal parallel suffix tree construction , 1994, STOC '94.

[14]  Travis Gagie,et al.  On the Value of Multiple Read/Write Streams for Data Compression , 2012, Information Theory, Combinatorics, and Search Theory.

[15]  Ge Nong,et al.  Linear Suffix Array Construction by Almost Pure Induced-Sorting , 2009, 2009 Data Compression Conference.

[16]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2003, J. Discrete Algorithms.

[17]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[18]  Ian Gilmour,et al.  Lossless Video Compression for Archives : Motion JPEG 2 k and Other Options , 2005 .

[19]  Yao Zhang,et al.  Parallel lossless data compression on the GPU , 2012, 2012 Innovative Parallel Computing (InPar).

[20]  Yair Wiseman,et al.  Burrows-Wheeler based JPEG , 2007, Data Sci. J..

[21]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[22]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[23]  Uzi Vishkin,et al.  Better speedups using simpler parallel programming for graph connectivity and biconnectivity , 2012, PMAM '12.

[24]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[25]  Travis Gagie,et al.  Lightweight Data Indexing and Compression in External Memory , 2009, Algorithmica.

[26]  Jeff Gilchrist,et al.  Parallel Lossless Data Compression Based on the Burrows-Wheeler Transform , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[27]  Uzi Vishkin,et al.  Symmetry breaking for suffix tree construction , 1994, STOC '94.

[28]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[29]  Robert E. Tarjan,et al.  Finding Biconnected Components and Computing Tree Functions in Logarithmic Parallel Time (Extended Summary) , 1984, FOCS.

[30]  George C. Caragea,et al.  Brief announcement: better speedups for parallel max-flow , 2011, SPAA '11.