Learning Conditional Linear Gaussian Classifiers with Probabilistic Class Labels

We study the problem of learning Bayesian classifiers (BC) when the true class label of the training instances is not known, and is substituted by a probability distribution over the class labels for each instance. This scenario can arise, e.g., when a group of experts is asked to individually provide a class label for each instance. We particularize the generalized expectation maximization (GEM) algorithm in [1] to learn BCs with different structural complexities: naive Bayes, averaged one-dependence estimators or general conditional linear Gaussian classifiers. An evaluation conducted on eight datasets shows that BCs learned with GEM perform better than those using either the classical Expectation Maximization algorithm or potentially wrong class labels. BCs achieve similar results to the multivariate Gaussian classifier without having to estimate the full covariance matrices.

[1]  Thierry Denoeux,et al.  Learning from partially supervised data using mixture models and belief functions , 2009, Pattern Recognit..

[2]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[3]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[4]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[5]  Michael Clarke,et al.  Symbolic and Quantitative Approaches to Reasoning and Uncertainty , 1991, Lecture Notes in Computer Science.

[6]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Patrick Vannoorenberghe,et al.  Partially Supervised Learning by a Credal EM Approach , 2005, ECSQARU.

[9]  Nir Friedman,et al.  Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting , 1998, ICML.

[10]  Philippe Smets,et al.  The Transferable Belief Model , 1994, Artif. Intell..

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.