A Bayesian Approach to Model Selection in Hierarchical Mixtures-of-Experts Architectures

There does not exist a statistical model that shows good performance on all tasks. Consequently, the model selection problem is unavoidable; investigators must decide which model is best at summarizing the data for each task of interest. This article presents an approach to the model selection problem in hierarchical mixtures-of-experts architectures. These architectures combine aspects of generalized linear models with those of finite mixture models in order to perform tasks via a recursive "divide-and-conquer" strategy. Markov chain Monte Carlo methodology is used to estimate the distribution of the architectures' parameters. One part of our approach to model selection attempts to estimate the worth of each component of an architecture so that relatively unused components can be pruned from the architecture's structure. A second part of this approach uses a Bayesian hypothesis testing procedure in order to differentiate inputs that carry useful information from nuisance inputs. Simulation results suggest that the approach presented here adheres to the dictum of Occam's razor; simple architectures that are adequate for summarizing the data are favored over more complex structures. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.

[1]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[2]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[3]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[4]  Fengchun Peng,et al.  Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic , 1996 .

[5]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[6]  Radford M. Neal Bayesian Mixture Modeling by Monte Carlo Simulation , 1991 .

[7]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[8]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[9]  Michael I. Jordan,et al.  Hierarchies of Adaptive Experts , 1991, NIPS.

[10]  M. Tanner Tools for statistical inference: methods for the exploration of posterior distributions and likeliho , 1994 .

[11]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[12]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[13]  Martin A. Tanner,et al.  Calculating the content and boundary of the highest posterior density region via data augmentation , 1990 .

[14]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[15]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  Steve R. Waterhouse,et al.  Classification using hierarchical mixtures of experts , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[18]  R. Belew Interposing an ontogenic model between Genetic Algorithms and Neural Networks , 1992 .

[19]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[20]  Richard K. Belew,et al.  Interposing an Ontogenetic Model Between Genetic Algorithms and Neural Networks , 1992, NIPS.

[21]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[22]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[23]  W. Wolberg,et al.  Statistical approach to fine needle aspiration diagnosis of breast masses. , 1987, Acta cytologica.

[24]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[25]  L. Joseph,et al.  Bayesian Statistics: An Introduction , 1989 .

[26]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.