Statistical learning by natural gradient descent

Based on stochastic perceptron models and statistical inference, we train single-layer and two-layer perceptrons by natural gradient descent. We have discovered an efficient scheme to present the Fisher information matrix of a stochastic two-layer perceptron. Based on this scheme, we have designed an algorithm to compute the natural gradient. When the input dimension n is much larger than the number of hidden neurons, the complexity of this algorithm is of order O (n). It is confirmed by simulations that the natural gradient descent learning rule is not only efficient but also robust.