Learning mixtures of polynomials of multidimensional probability densities from data using B-spline interpolation

Non-parametric density estimation is an important technique in probabilistic modeling and reasoning with uncertainty. We present a method for learning mixtures of polynomials (MoPs) approximations of one-dimensional and multidimensional probability densities from data. The method is based on basis spline interpolation, where a density is approximated as a linear combination of basis splines. We compute maximum likelihood estimators of the mixing coefficients of the linear combination. The Bayesian information criterion is used as the score function to select the order of the polynomials and the number of pieces of the MoP. The method is evaluated in two ways. First, we test the approximation fitting. We sample artificial datasets from known one-dimensional and multidimensional densities and learn MoP approximations from the datasets. The quality of the approximations is analyzed according to different criteria, and the new proposal is compared with MoPs learned with Lagrange interpolation and mixtures of truncated basis functions. Second, the proposed method is used as a non-parametric density estimation technique in Bayesian classifiers. Two of the most widely studied Bayesian classifiers, i.e., the naive Bayes and tree-augmented naive Bayes classifiers, are implemented and compared. Results on real datasets show that the non-parametric Bayesian classifiers using MoPs are comparable to the kernel density-based Bayesian classifiers. We provide a free R package implementing the proposed methods.

[1]  I. Faux,et al.  Computational Geometry for Design and Manufacture , 1979 .

[2]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[3]  Gerhard Tutz,et al.  Knot selection by boosting techniques , 2007, Comput. Stat. Data Anal..

[4]  Prakash P. Shenoy,et al.  Compositional models in valuation-based systems , 2012, Int. J. Approx. Reason..

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  Tomas Sauer,et al.  Polynomial interpolation in several variables , 2000, Adv. Comput. Math..

[7]  Adrian F. M. Smith,et al.  Automatic Bayesian curve fitting , 1998 .

[8]  Serafín Moral,et al.  Mixtures of Truncated Exponentials in Hybrid Bayesian Networks , 2001, ECSQARU.

[9]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[10]  Teemu Roos,et al.  Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood , 2011, J. Mach. Learn. Res..

[11]  Pedro Larrañaga,et al.  Bayesian classifiers based on kernel density estimation: Flexible classifiers , 2009, Int. J. Approx. Reason..

[12]  Young K. Truong,et al.  Polynomial splines and their tensor products in extended linearmodeling , 1997 .

[13]  José A. Gámez,et al.  Data clustering using hidden variables in hybrid Bayesian networks , 2014, Progress in Artificial Intelligence.

[14]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[15]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[16]  Prakash P. Shenoy,et al.  Axioms for probability and belief-function proagation , 1990, UAI.

[17]  Victor M. Panaretos,et al.  Nonparametric Construction of Multivariate Kernels , 2012 .

[18]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[19]  Jose Miguel Puerta,et al.  Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter? , 2011, Applied Intelligence.

[20]  Chun-Nan Hsu,et al.  Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers , 2004, Machine Learning.

[21]  Rafael Rumí,et al.  Parameter estimation and model selection for mixtures of truncated exponentials , 2010, Int. J. Approx. Reason..

[22]  M. J. Fryer A Review of Some Non-parametric Methods of Density Estimation , 1977 .

[23]  Zhi Zong Information-theoretic methods for estimating complicated probability distributions , 2006 .

[24]  Carmelo Rodríguez,et al.  Selective Naive Bayes for Regression Based on Mixtures of Truncated Exponentials , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[26]  Rafael Rumí,et al.  Inference in hybrid Bayesian networks with Mixtures of Truncated Basis Functions , 2012, PGM 2012.

[27]  Marco Vianello,et al.  Hyperinterpolation in the cube , 2008, Comput. Math. Appl..

[28]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[29]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[30]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[31]  Lawrence A. Harris,et al.  Bivariate Lagrange interpolation at the Chebyshev nodes , 2010 .

[32]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[33]  W. Boehm,et al.  Bezier and B-Spline Techniques , 2002 .

[34]  I. J. Schoenberg Contributions to the problem of approximation of equidistant data by analytic functions. Part A. On the problem of smoothing or graduation. A first class of analytic approximation formulae , 1946 .

[35]  Rafael Rumí,et al.  Learning hybrid Bayesian networks using mixtures of truncated exponentials , 2006, Int. J. Approx. Reason..

[36]  Pedro Larrañaga,et al.  Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes , 2006, Int. J. Approx. Reason..

[37]  Ingrid K. Glad,et al.  Correction of Density Estimators that are not Densities , 2003 .

[38]  Prakash P. Shenoy Two issues in using mixtures of polynomials for inference in hybrid Bayesian networks , 2012, Int. J. Approx. Reason..

[39]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[40]  Prakash P. Shenoy,et al.  Inference in hybrid Bayesian networks using mixtures of polynomials , 2011, Int. J. Approx. Reason..

[41]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[42]  K. Lam,et al.  Estimation of complicated distributions using B-spline functions , 1998 .

[43]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[44]  David G. Stork,et al.  Pattern Classification , 1973 .

[45]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[46]  Serafín Moral,et al.  Estimating mixtures of truncated exponentials in hybrid bayesian networks , 2006 .

[47]  Concha Bielza,et al.  Learning mixtures of polynomials from data using B-spline interpolation , 2012 .

[48]  Rafael Rumí,et al.  Mixtures of truncated basis functions , 2012, Int. J. Approx. Reason..

[49]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[50]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[51]  Ildikó Flesch,et al.  Incremental Supervised Classification for the MTE Distribution: a Preliminary Study , 2007 .

[52]  Young K. Truong,et al.  Polynomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture , 1997 .

[53]  Rafael Rumí,et al.  Maximum Likelihood Learning of Conditional MTE Distributions , 2009, ECSQARU.

[54]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[55]  Antonio Salmerón,et al.  Tree Augmented Naive Bayes for Regression Using Mixtures of Truncated Exponentials: Application to Higher Education Management , 2007, IDA.

[56]  M. Wand,et al.  ASYMPTOTICS FOR GENERAL MULTIVARIATE KERNEL DENSITY DERIVATIVE ESTIMATORS , 2011 .

[57]  M. Stone The Generalized Weierstrass Approximation Theorem , 1948 .

[58]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[59]  José A. Gámez,et al.  Mixture of truncated exponentials in supervised classification: Case study for the naive bayes and averaged one-dependence estimators classifiers , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[60]  Prakash P. Shenoy,et al.  A valuation-based language for expert systems , 1989, Int. J. Approx. Reason..

[61]  Antonio Salmerón,et al.  Learning mixtures of truncated basis functions from data , 2014, Int. J. Approx. Reason..

[62]  Alistair B. Forbes,et al.  Spline Approximation Using Knot Density Functions , 2007 .

[63]  Antonio Salmerón,et al.  Learning Bayesian Networks for Regression from Incomplete Databases , 2010, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[64]  José A. Gámez,et al.  Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials , 2006, Probabilistic Graphical Models.

[65]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[66]  Geoffrey I. Webb,et al.  On Why Discretization Works for Naive-Bayes Classifiers , 2003, Australian Conference on Artificial Intelligence.

[67]  Prakash P. Shenoy,et al.  Approximating Probability Density Functions with Mixtures of Truncated Exponentials , 2004 .

[68]  Stephen M. Krone,et al.  Markov Chain Monte Carlo in small worlds , 2006, Stat. Comput..

[69]  Marco Vianello,et al.  Bivariate Lagrange interpolation at the Padua points: the ideal theory approach , 2007, Numerische Mathematik.

[70]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.

[71]  R. Kass,et al.  Bayesian curve-fitting with free-knot splines , 2001 .

[72]  Prakash P. Shenoy,et al.  Approximating probability density functions in hybrid Bayesian networks with mixtures of truncated exponentials , 2006, Stat. Comput..

[73]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[74]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[75]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[76]  Jenq-Neng Hwang,et al.  Nonparametric multivariate density estimation: a comparative study , 1994, IEEE Trans. Signal Process..