Bias/Variance Analyses of Mixtures-of-Experts Architectures

This article investigates the bias and variance of mixtures-of-experts (ME) architectures. The variance of an ME architecture can be expressed as the sum of two terms: the first term is related to the variances of the expert networks that comprise the architecture and the second term is related to the expert networks' covariances. One goal of this article is to study and quantify a number of properties of ME architectures via the metrics of bias and variance. A second goal is to clarify the relationships between this class of systems and other systems that have recently been proposed. It is shown that in contrast to systems that produce unbiased experts whose estimation errors are uncorrelated, ME architectures produce biased experts whose estimates are negatively correlated.

[1]  J. Friedman,et al.  Multidimensional Additive Spline Approximation , 1983 .

[2]  Robert L. Winkler,et al.  Limits for the Precision and Value of Information from Dependent Sources , 1985, Oper. Res..

[3]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[4]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[5]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[6]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[7]  Michael I. Jordan,et al.  A Competitive Modular Connectionist Architecture , 1990, NIPS.

[8]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[9]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[12]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[13]  Michael I. Jordan,et al.  Learning piecewise control strategies in a modular neural network architecture , 1993, IEEE Trans. Syst. Man Cybern..

[14]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[15]  Robert A. Jacobs,et al.  Encoding Shape and Spatial Relations: The Role of Receptive Field Size in Coordinating Complementary Representations , 1994, Cogn. Sci..

[16]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[17]  Steve R. Waterhouse,et al.  Classification using hierarchical mixtures of experts , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[18]  Steve R. Waterhouse,et al.  Constructive Algorithms for Hierarchical Mixtures of Experts , 1995, NIPS.

[19]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[20]  Fengchun Peng,et al.  Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic , 1996 .

[21]  Ronny Meir,et al.  Bias, variance and the combination of estimators; The case of linear least squares , 1995 .

[22]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[23]  Nathan Intrator,et al.  Bootstrapping with Noise: An Effective Regularization Technique , 1996, Connect. Sci..

[24]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[25]  Robert A. Jacobs,et al.  A Bayesian Approach to Model Selection in Hierarchical Mixtures-of-Experts Architectures , 1997, Neural Networks.

[26]  William L. Kilmer,et al.  A command computer for complex autonomous systems , 1997, Neurocomputing.

[27]  Ron Sun,et al.  Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.

[28]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[29]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.