Bayesian Model Averaging of TAN Models for Clustering

Selecting a single model for clustering ignores the uncertainty left by finite data as to which is the correct model to describe the dataset. In fact, the fewer samples the dataset has, the higher the uncertainty is in model selection. In these cases, a Bayesian approach may be beneficial, but unfortunately this approach is usually computationally intractable and only approximations are feasible. For supervised classification problems, it has been demonstrated that model averaging calculations, under some restrictions, are feasible and efficient. In this paper, we extend the expectation model averaging (EMA) algorithm originally proposed in Santafe et al. (2006) to deal with model averaging of naive Bayes models for clustering. Thus, the extended algorithm, EMA-TAN, allows to perform an efficient approximation for a model averaging over the class of tree augmented naive Bayes (TAN) models for clustering. We also present some empirical results that show how the EMA algorithm based on TAN outperforms other clustering methods.

[1]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[2]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[3]  Ramón López de Mántaras,et al.  TAN Classifiers Based on Decomposable Distributions , 2005, Machine Learning.

[4]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[5]  Bertrand Clarke,et al.  Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored , 2003, J. Mach. Learn. Res..

[6]  J.A. Lozano,et al.  Bayesian Model Averaging of Naive Bayes for Clustering , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[8]  Thomas D. Nielsen,et al.  Bayesian Networks as Classifiers , 2007 .

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[12]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[13]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[14]  Pedro Larrañaga,et al.  Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction , 2002, Machine Learning.

[15]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[16]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[17]  Gregory F. Cooper,et al.  Model Averaging for Prediction with Discrete Bayesian Networks , 2004, J. Mach. Learn. Res..

[18]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[19]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .