A probabilistic approach to the understanding and training of neural network classifiers

It is shown that training a neural network using a mean-square-error criterion gives network outputs that approximate posterior class probabilities. Based on this probabilistic interpretation of the network operation, information-theoretic training criteria such as maximum mutual information and the Kullback-Liebler measure are investigated. It is shown that both of these criteria are equivalent to the maximum-likelihood estimation (MLE) of the network parameters. MLE of a network allows for the comparison of network models using the Akaike information criterion and the minimum-description length criterion.<<ETX>>