论文信息 - Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition

Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition

We are concerned with feed-forward non-linear networks (multi-layer perceptrons, or MLPs) with multiple outputs. We wish to treat the outputs of the network as probabilities of alternatives (e.g. pattern classes), conditioned on the inputs. We look for appropriate output non-linearities and for appropriate criteria for adaptation of the parameters of the network (e.g. weights). We explain two modifications: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non-linearity. The two modifications together result in quite simple arithmetic, and hardware implementation is not difficult either. The use of radial units (squared distance instead of dot product) immediately before the softmax output stage produces a network which computes posterior distributions over class labels based on an assumption of Gaussian within-class distributions. However the training, which uses cross-class information, can result in better performance at class discrimination than the usual within-class training method, unless the within-class distribution assumptions are actually correct.

John Scott Bridle | J. Bridle

[1] G. E. Peterson,et al. Control Methods Used in a Study of the Vowels , 1951 .

[2] H. D. Miller,et al. The Theory Of Stochastic Processes , 1977, The Mathematical Gazette.

[3] Andrew J. Viterbi,et al. Principles of Digital Communication and Coding , 1979 .

[4] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Eric B. Baum,et al. Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[6] Richard Lippmann,et al. Neural Net and Traditional Classifiers , 1987, NIPS.

[7] Esther Levin,et al. Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[8] Terrence J. Sejnowski,et al. NETtalk: a parallel network that learns to read aloud , 1988 .

[9] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..

[10] Allen Gersho,et al. The Boltzmann Perceptron Network: A soft classifier , 1990, Neural Networks.