Objective priors from maximum entropy in data classification

Lack of knowledge of the prior distribution in classification problems that operate on small data sets may make the application of Bayes' rule questionable. Uniform or arbitrary priors may provide classification answers that, even in simple examples, may end up contradicting our common sense about the problem. Entropic priors (EPs), via application of the maximum entropy (ME) principle, seem to provide good objective answers in practical cases leading to more conservative Bayesian inferences. EP are derived and applied to classification tasks when only the likelihood functions are available. In this paper, when inference is based only on one sample, we review the use of the EP also in comparison to priors that are obtained from maximization of the mutual information between observations and classes. This last criterion coincides with the maximization of the KL divergence between posteriors and priors that for large sample sets leads to the well-known reference (or Bernardo's) priors. Our comparison on single samples considers both approaches in prospective and clarifies differences and potentials. A combinatorial justification for EP, inspired by Wallis' combinatorial argument for entropy definition, is also included. The application of the EP to sequences (multiple samples) that may be affected by excessive domination of the class with the maximum entropy is also considered with a solution that guarantees posterior consistency. An explicit iterative algorithm is proposed for EP determination solely from knowledge of the likelihood functions. Simulations that compare EP with uniform priors on short sequences are also included.

[1]  Francesco Palmieri,et al.  Entropic priors for short-term stochastic process classification , 2011, 14th International Conference on Information Fusion.

[2]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Ronald P. S. Mahler,et al.  Statistical Multisource-Multitarget Information Fusion , 2007 .

[5]  H. Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[6]  Igor Vajda,et al.  Entropy expressions for multivariate continuous distributions , 2000, IEEE Trans. Inf. Theory.

[7]  F. Palmieri,et al.  Consistency of sequence classification with entropic priors , 2012 .

[8]  John G. van Bosse,et al.  Wiley Series in Telecommunications and Signal Processing , 2006 .

[9]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[10]  Tilman Neumann Bayesian Inference Featuring Entropic Priors , 2007 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Arnold Zellner,et al.  Models, prior information, and Bayesian analysis☆ , 1996 .

[13]  Ariel Caticha Maximum entropy, fluctuations and priors , 2001 .

[14]  Ariel Caticha,et al.  Updating Probabilities with Data and Moments , 2007, ArXiv.

[15]  F. Palmieri,et al.  Entropic priors for hidden-Markov model classification , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[16]  Pushpa N. Rathie,et al.  On the entropy of continuous probability distributions (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[17]  V. Majerník Marginal probability distribution determined by the maximum entropy method , 2000 .

[18]  J. Bernardo,et al.  THE FORMAL DEFINITION OF REFERENCE PRIORS , 2009, 0904.0156.

[19]  Philippe Smets,et al.  Decision making in the TBM: the necessity of the pignistic transformation , 2005, Int. J. Approx. Reason..

[20]  J. Berger The case for objective Bayesian analysis , 2006 .

[21]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[22]  Howard Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[23]  Dsto PO Box,et al.  Belief function theory on the continuous space with an application to model based classification , 2006 .

[24]  Jeff B. Paris Common Sense and Maximum Entropy , 2004, Synthese.

[25]  Philippe Smets,et al.  Belief functions: The disjunctive rule of combination and the generalized Bayesian theorem , 1993, Int. J. Approx. Reason..

[26]  R. Preuss,et al.  Maximum entropy and Bayesian data analysis: Entropic prior distributions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[28]  P. Laplace Théorie analytique des probabilités , 1995 .

[29]  Ariel Caticha,et al.  Entropic Inference , 2010, 1011.0723.

[30]  Brendon J. Brewer,et al.  Entropic Priors and Bayesian Model Selection , 2009, 0906.5609.

[31]  P. Smets,et al.  Target classification approach based on the belief function theory , 2005, IEEE Transactions on Aerospace and Electronic Systems.

[32]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[33]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[34]  James O. Berger,et al.  Objective Priors for Discrete Parameter Spaces , 2012 .

[35]  C. C. Rodriguez Entropic priors for discrete probabilistic networks and for mixtures of Gaussians models , 2002, physics/0201016.

[36]  Francesco Palmieri,et al.  Data Fusion with Entropic Priors , 2010, WIRN.

[37]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .