Discriminative Learning of Bayesian Network Classifiers via the TM Algorithm

The learning of probabilistic classification models can be approached from either a generative or a discriminative point of view. Generative methods attempt to maximize the unconditional log-likelihood, while the aim of discriminative methods is to maximize the conditional log-likelihood. In the case of Bayesian network classifiers, the parameters of the model are usually learned by generative methods rather than discriminative ones. However, some numerical approaches to the discriminative learning of Bayesian network classifiers have recently appeared. This paper presents a new statistical approach to the discriminative learning of these classifiers by means of an adaptation of the TM algorithm [1]. In addition, we test the TM algorithm with different Bayesian classification models, providing empirical evidence of the performance of this method.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  S. Lauritzen,et al.  The TM algorithm for maximising a conditional likelihood function , 2001 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[5]  Pedro Larrañaga,et al.  Feature subset selection by genetic algorithms and estimation of distribution algorithms - A case study in the survival of cirrhotic patients treated with TIPS , 2001, Artif. Intell. Medicine.

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[8]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Henry Tirri,et al.  On Discriminative Bayesian Network Classifiers and Logistic Regression , 2005, Machine Learning.

[11]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12]  Rolf Sundberg The convergence rate of the TM algorithm of Edwards & Lauritzen , 2002 .

[13]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[14]  A P Dawid,et al.  Properties of diagnostic data distributions. , 1976, Biometrics.

[15]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[16]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[17]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[18]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[19]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[20]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.