Efficient Probabilistic Classification Vector Machine With Incremental Basis Function Selection

Probabilistic classification vector machine (PCVM) is a sparse learning approach aiming to address the stability problems of relevance vector machine for classification problems. Because PCVM is based on the expectation maximization algorithm, it suffers from sensitivity to initialization, convergence to local minima, and the limitation of Bayesian estimation making only point estimates. Another disadvantage is that PCVM was not efficient for large data sets. To address these problems, this paper proposes an efficient PCVM (EPCVM) by sequentially adding or deleting basis functions according to the marginal likelihood maximization for efficient training. Because of the truncated prior used in EPCVM, two approximation techniques, i.e., Laplace approximation and expectation propagation (EP), have been used to implement EPCVM to obtain full Bayesian solutions. We have verified Laplace approximation and EP with a hybrid Monte Carlo approach. The generalization performance and computational effectiveness of EPCVM are extensively evaluated. Theoretical discussions using Rademacher complexity reveal the relationship between the sparsity and the generalization bound of EPCVM.

[1]  David Barber,et al.  Ensemble Learning for Multi-Layer Networks , 1997, NIPS.

[2]  Xin Yao,et al.  Sparse Approximation Through Boosting for Learning Large Scale Kernel Machines , 2010, IEEE Transactions on Neural Networks.

[3]  Huanhuan Chen,et al.  Predictive Ensemble Pruning by Expectation Propagation , 2009, IEEE Transactions on Knowledge and Data Engineering.

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[6]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Huanhuan Chen,et al.  A Probabilistic Ensemble Pruning Algorithm , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[9]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[10]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[11]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[12]  Huanhuan Chen,et al.  Regularized Negative Correlation Learning for Neural Network Ensembles , 2009, IEEE Transactions on Neural Networks.

[13]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[14]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[15]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[18]  Guido Sanguinetti,et al.  Bayesian Multitask Classification With Gaussian Process Priors , 2011, IEEE Transactions on Neural Networks.

[19]  Huanhuan Chen,et al.  Multiobjective Neural Network Ensembles Based on Regularized Negative Correlation Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[20]  Huanhuan Chen,et al.  Semisupervised Classification With Cluster Regularization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Huanhuan Chen,et al.  Probabilistic Classification Vector Machines , 2009, IEEE Transactions on Neural Networks.

[22]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[23]  Michael E. Tipping,et al.  Analysis of Sparse Bayesian Learning , 2001, NIPS.

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[26]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[27]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[28]  Lawrence Carin,et al.  A Bayesian approach to joint feature selection and classifier design , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[30]  Yang Yu,et al.  Diversity Regularized Ensemble Pruning , 2012, ECML/PKDD.

[31]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[34]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[35]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[36]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003, AISTATS.

[37]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .