Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory such as AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models.