Data compression in multiterminal statistical inference —linear-threshold encoding

When correlated letters are generated from two or more information sources at different locations, we need to transmit observed data to a common destination for the purpose of estimating or testing the joint probability distribution of the information sources. When we need to compress data separately at each location, what is the optimal data compression scheme? This is a fundamental problem of multiterminal statistical inference proposed by T. Berger [1] and still remains unsolved. We give a new idea of linear-threshold encoding for data compression, and study the performances of this class of data compression by using a simple binary information sources. In order to estimate or test the correlation of two sources, we show that, when correlation is weak, a simple (trivial) encoding where each encoded bit depends only on one original letter is optimal, that is, the case of no substantial encoding, just discarding overflown letters. As the strength of correlation increases, it is better to use a number of letters to encode each bit, for example, to use the majority of three letters, in the case of transmission rate is 1/3. Further, when the correlation is very strong, it is better to encode each bit by using all the letters, where weighted majority decision plays a fundamental role.