Statistical Mechanics of the Mixture of Experts

We study generalization capability of the mixture of experts learning from examples generated by another network with the same architecture. When the number of examples is smaller than a critical value, the network shows a symmetric phase where the role of the experts is not specialized. Upon crossing the critical point, the system undergoes a continuous phase transition to a symmetry breaking phase where the gating network partitions the input space effectively and each expert is assigned to an appropriate subspace. We also find that the mixture of experts with multiple level of hierarchy shows multiple phase transitions.

[1]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[2]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[3]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[4]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[5]  Oh,et al.  Generalization in a two-layer neural network. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[6]  David Saad,et al.  Learning and Generalization in Radial Basis Function Networks , 1995, Neural Computation.

[7]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[8]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[9]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[10]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[11]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[12]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .