Markov Property in Generative Classifiers

We show that, for generative classifiers, conditional independence corresponds to linear constraints for the induced discrimination functions. Discrimination functions of undirected Markov network classifiers can thus be characterized by sets of linear constraints. These constraints are represented by a second order finite difference operator over functions of categorical variables. As an application we study the expressive power of generative classifiers under the undirected Markov property and we present a general method to combine discriminative and generative classifiers.

[1]  G. Grimmett A THEOREM ABOUT RANDOM FIELDS , 1973 .

[2]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  D. Cox The Regression Analysis of Binary Sequences , 2017 .

[5]  Bernd Sturmfels,et al.  Algebraic geometry of Bayesian networks , 2005, J. Symb. Comput..

[6]  Jim Q. Smith,et al.  On the Geometry of Bayesian Graphical Models with Hidden Variables , 1998, UAI.

[7]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[8]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[9]  Youlong Yang,et al.  Complexity of concept classes induced by discrete Markov networks and Bayesian networks , 2018, Pattern Recognit..

[10]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[11]  Alberto Gandolfi,et al.  A note on Gibbs and Markov random fields with constraints and their moments , 2016 .

[12]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[13]  Manfred Jaeger Probabilistic Classifiers and the Concepts They Recognize , 2003, ICML.

[14]  S. Fienberg,et al.  The Geometry of a Two by Two Contingency Table , 1970 .

[15]  G. Rota On the Foundations of Combinatorial Theory , 2009 .

[16]  P. Diaconis,et al.  Algebraic algorithms for sampling from conditional distributions , 1998 .

[17]  Concha Bielza,et al.  Decision boundary for discrete Bayesian network classifiers , 2015, J. Mach. Learn. Res..

[18]  Charles X. Ling,et al.  The Representational Power of Discrete Bayesian Networks , 2002, J. Mach. Learn. Res..

[19]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[20]  Geoffrey I. Webb,et al.  Efficient parameter learning of Bayesian network classifiers , 2016, Machine Learning.

[21]  Mark A. Peot,et al.  Geometric Implications of the Naive Bayes Assumption , 1996, UAI.

[22]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[23]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[24]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[25]  S. Fienberg An Iterative Procedure for Estimation in Contingency Tables , 1970 .