\neural" Redundancy Reduction for Text Compression

The components of most real-world patterns and pattern sequences carry redundant information. Most pattern classiiers (e.g. statistical classiiers and neural nets), however, work better if pattern components are non-redundant. Previous papers by this author introduced various unsupervised \neural" learning algorithms that transform patterns and pattern sequences into less redundant patterns without loss of information. Experiments with time series prediction tasks demonstrated that conventional gradient-based classiication techniques can greatly beneet from this kind of unsupervised pre-processing. Encouraged by these earlier results, the paper at hand compares conventional data compression methods to a \neural" method. A neural net is used in conjunction with a statistical coding technique to compress text les without loss of information. The method is applied to short newspaper articles. The obtained compression ratios exceed those of the widely used asymptotically optimal Lempel-Ziv algorithm (which builds the basis of the UNIX functions \compress" and \gzip"). A disadvantage of the method, however, is its high computational complexity.