Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction

This paper introduces and evaluates a new class of knowledge model, the recursive Bayesian multinet (RBMN), which encodes the joint probability distribution of a given database. RBMNs extend Bayesian networks (BNs) as well as partitional clustering systems. Briefly, a RBMN is a decision tree with component BNs at the leaves. A RBMN is learnt using a greedy, heuristic approach akin to that used by many supervised decision tree learners, but where BNs are learnt at leaves using constructive induction. A key idea is to treat expected data as real data. This allows us to complete the database and to take advantage of a closed form for the marginal likelihood of the expected complete data that factorizes into separate marginal likelihoods for each family (a node and its parents). Our approach is evaluated on synthetic and real-world databases.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Ryszard S. Michalski,et al.  Pattern Recognition as Knowledge-Guided Computer Induction , 1978 .

[7]  Y. Escoufier,et al.  Analyse Typologique. Theories et Applications , 1982 .

[8]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[9]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[13]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[16]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[17]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[18]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[19]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[20]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[21]  Tomasz Arciszewski,et al.  CONSTRUCTIVE INDUCTION: THE KEY TO DESIGN CREATIVITY , 1995 .

[22]  José Manuel Gutiérrez,et al.  Expert Systems and Probabiistic Network Models , 1996 .

[23]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[24]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[25]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[26]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[27]  Mark A. Peot,et al.  Geometric Implications of the Naive Bayes Assumption , 1996, UAI.

[28]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[29]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[30]  Nir Friedman,et al.  Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[31]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[32]  Paola Sebastiani,et al.  Learning Bayesian Networks from Incomplete Databases , 1997, UAI.

[33]  M. Pazzani Constructive Induction of Cartesian Product Attributes , 1998 .

[34]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[35]  Paola Sebastiani,et al.  Parameter Estimation in Bayesian Networks from Incomplete Databases , 1998, Intell. Data Anal..

[36]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[37]  Nir Friedman,et al.  Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting , 1998, ICML.

[38]  Marina Meila,et al.  An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[39]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[40]  Pedro Larrañaga,et al.  Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[41]  Kathryn B. Laskey,et al.  Uncertainty in Artificial Intelligence 15 , 1999 .

[42]  Paola Sebastiani,et al.  Learning conditional probabilities from incomplete databases - An experimental comparison , 1999, AISTATS.

[43]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[44]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[45]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[46]  Pedro Larrañaga,et al.  Geographical clustering of cancer incidence by means of Bayesian networks and conditional Gaussian networks , 2001, AISTATS.

[47]  Pedro Larrañaga,et al.  Performance evaluation of compromise conditional Gaussian networks for data clustering , 2001, Int. J. Approx. Reason..

[48]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[49]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[50]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[51]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[52]  Gilford Hapanyengwi,et al.  Database management and analysis tools of machine induction , 1993, Journal of Intelligent Information Systems.

[53]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.