Nonlinear Models Using Dirichlet Process Mixtures

We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classification problems: identifying the folding class of protein sequences and detecting Parkinson's disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into sub-populations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data.

[1]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[5]  T. Ferguson BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[6]  Planar Phospholipid,et al.  RECENT ADVANCES IN , 1986 .

[7]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[8]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[9]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[10]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[12]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[13]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[14]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[15]  P. Müller,et al.  Bayesian curve fitting using multivariate normal mixtures , 1996 .

[16]  S. MacEachern,et al.  A semiparametric Bayesian model for randomised block designs , 1996 .

[17]  HintonDepartment,et al.  The EM Algorithm for Mixtures of Factor AnalyzersZoubin GhahramaniGeo , 1997 .

[18]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[19]  R. Iansek,et al.  Speech impairment in a large sample of patients with Parkinson's disease. , 1998, Behavioural neurology.

[20]  R. Kass,et al.  Nonconjugate Bayesian Estimation of Covariance Matrices and its Use in Hierarchical Models , 1999 .

[21]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[24]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[25]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[26]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[27]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[28]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[29]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Radford M. Neal The Short-Cut Metropolis Method , 2005, math/0508060.

[32]  Ilkay Ulusoy,et al.  Generative versus discriminative methods for object recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Simon Osindero,et al.  An Alternative Infinite Mixture Of Gaussian Process Experts , 2005, NIPS.

[34]  Radford M. Neal,et al.  Improving Classification When a Class Hierarchy is Available Using a Hierarchy-Based Prior , 2005, math/0510449.

[35]  Babak Shahbaba,et al.  Gene function classification using Bayesian models with hierarchy-based priors , 2006, BMC Bioinformatics.

[36]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[37]  D. Dunson,et al.  Bayesian Covariance Selection in Generalized Linear Mixed Models , 2006, Biometrics.

[38]  Jennifer L. Spielman,et al.  Effects of intensive voice treatment (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson disease: acoustic and perceptual findings. , 2007, Journal of speech, language, and hearing research : JSLHR.

[39]  Jack J. Jiang,et al.  Phonatory impairment in Parkinson's disease: evidence from nonlinear dynamic analysis and perturbation analysis. , 2007, Journal of voice : official journal of the Voice Foundation.

[40]  S. MacEachern,et al.  Bayesian Density Estimation and Inference Using Mixtures , 2007 .

[41]  Neha Singh,et al.  Advances in the treatment of Parkinson's disease , 2007, Progress in Neurobiology.

[42]  Radford M. Neal,et al.  Splitting and merging components of a nonconjugate Dirichlet process mixture model , 2007 .

[43]  Babak Shahbaba,et al.  Improving classification models when a class hierarchy is available , 2007 .

[44]  Jean Schoentgen,et al.  Low-frequency vocal modulations in vowels produced by Parkinsonian subjects , 2008, Speech Commun..

[45]  Max A. Little,et al.  Suitability of Dysphonia Measurements for Telemonitoring of Parkinson's Disease , 2008, IEEE Transactions on Biomedical Engineering.

[46]  Shahbaba Babak Discovering Hidden Structures Using Mixture Models: Application to Nonlinear Time Series Processes , 2009 .

[47]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[48]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..