Generalization Error and Training Error at Singularities of Multilayer Perceptrons

The neuromanifold or the parameter space of multila yer perceptrons includes complex singularities at which the Fisher information matrix degenerates. The parameters are unidentifiable at singularities, and this causes serious difficulties in learning, known as plateaus in the cost function. The natural or adaptive natural gradient method is proposed for overcoming this difficulty. It is important to study the relation between the generalization error and and the training error at the singularities, because the generalization error is estimated in terms of the training error. The generalization error is studied both for the maximum likelihood estimator (mle) and the Bayesian predictive distribution estimator in terms of the Gaussian random field, by using a simple model. This elucidates the strange behaviors of learning dynamics around singularities.