论文信息 - Neural networks for pattern recognition

Neural networks for pattern recognition

From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

Christopher M. Bishop | C. Bishop

[1] H. W. Raudenbush. On Hilbert's thirteenth Paris problem , 1927 .

[2] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3] Kenneth Levenberg. A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[4] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[5] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[6] J. Blum. Multidimensional Stochastic Approximation Methods , 1954 .

[7] S. Kullback,et al. Information Theory and Statistics , 1959 .

[8] J. Orbach. Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[9] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .

[10] H. D. Block. The perceptron: a model for brain functioning. I , 1962 .

[11] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[12] E. Nadaraya. On Estimating Regression , 1964 .

[13] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[14] D. Sprecher. On the structure of continuous functions of several variables , 1965 .

[15] E. M. Wright,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[16] C. G. Hilborn,et al. The Condensed Nearest Neighbor Rule , 1967 .

[17] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[18] A. M. Walker. On the Asymptotic Behaviour of Posterior Distributions , 1969 .

[19] H. Akaike. Fitting autoregressive models for prediction , 1969 .

[20] N. E. Day. Estimating the components of a mixture of normal distributions , 1969 .

[21] Robert O. Winder,et al. Threshold logic , 1971, IEEE Spectrum.

[22] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[23] A. G. Ivakhnenko,et al. Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[24] J. M. Watt. Numerical Initial Value Problems in Ordinary Differential Equations , 1972 .

[25] C. R. Rao,et al. Generalized Inverse of Matrices and its Applications , 1972 .

[26] H. Akaike,et al. Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[27] Kanti V. Mardia,et al. Statistics of Directional Data , 1972 .

[28] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[29] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[30] D. Anderson,et al. Algorithms for minimization without derivatives , 1974 .

[31] Keinosuke Fukunaga,et al. A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[32] G. Wahba,et al. A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[33] J. Cooley,et al. The Fast Fourier Transform , 1975 .

[34] T. Gerig. Multivariate Analysis: Techniques for Educational and Psychological Research , 1975 .

[35] J. Kahane. Sur le théorème de superposition de Kolmogorov , 1975 .

[36] M. J. D. Powell,et al. Restart procedures for the conjugate gradient method , 1977, Math. Program..

[37] Keinosuke Fukunaga,et al. A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[38] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[39] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40] David F. Shanno,et al. Conjugate Gradient Methods with Inexact Searches , 1978, Math. Oper. Res..

[41] David J. Hand,et al. Experiments on the edited condensed nearest neighbor rule , 1978, Inf. Sci..

[42] M. Stone. Cross-validation:a review 2 , 1978 .

[43] Carl de Boor,et al. A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[44] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[45] Andrew J. Viterbi,et al. Principles of Digital Communication and Coding , 1979 .

[46] Thomas Kailath,et al. Linear Systems , 1980 .

[47] David J. Hand,et al. Discrimination and Classification , 1982 .

[48] J. Friedman,et al. Projection Pursuit Regression , 1981 .

[49] Philip E. Gill,et al. Practical optimization , 1981 .

[50] E. Oja. Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[51] Keinosuke Fukunaga. 15 Intrinsic dimensionality extraction , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[52] Josef Kittler,et al. Pattern recognition : a statistical approach , 1982 .

[53] Takayuki Ito,et al. Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[54] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[55] John E. Dennis,et al. Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[56] R. Redner,et al. Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[57] P. Diaconis,et al. On Nonlinear Functions of Linear Combinations , 1984 .

[58] James O. Berger,et al. Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[59] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[60] Geoffrey E. Hinton,et al. Experiments on Learning by Back Propagation. , 1986 .

[61] A. F. Smith,et al. Statistical analysis of finite mixture distributions , 1986 .

[62] L. Devroye. Non-Uniform Random Variate Generation , 1986 .

[63] E. T. Jaynes,et al. BAYESIAN METHODS: GENERAL BACKGROUND ? An Introductory Tutorial , 1986 .

[64] C. Micchelli. Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[65] L. Jones. On a conjecture of Huber concerning the convergence of projection pursuit regression , 1987 .

[66] Robin Sibson,et al. What is projection pursuit , 1987 .

[67] Robert M. Farber,et al. How Neural Nets Work , 1987, NIPS.

[68] Geoffrey E. Hinton. Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[69] Colin Giles,et al. Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[70] J J Hopfield,et al. Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[71] Eric B. Baum,et al. Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[72] C. D. Kemp,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[73] Richard Lippmann,et al. Neural Net and Traditional Classifiers , 1987, NIPS.

[74] Michael A. Arbib,et al. Brains, machines and mathematics (2. ed.) , 1987 .

[75] Stephen M. Omohundro,et al. Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..

[76] James A. Anderson,et al. Neurocomputing: Foundations of Research , 1988 .

[77] Eric B. Baum,et al. On the capabilities of multilayer perceptrons , 1988, J. Complex..

[78] Esther Levin,et al. Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[79] Yves Chauvin,et al. A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[80] Kunihiko Fukushima,et al. Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[81] S. Gull. Bayesian Inductive Inference and Maximum Entropy , 1988 .

[82] S. Ragazzini,et al. Learning of word stress in a sub-optimal second order back-propagation neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[83] D. R. Hush,et al. Improving the learning rate of back-propagation with the gradient reuse algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[84] David Lowe,et al. A Hybrid Optimisation Strategy for Adaptive Feed-Forward Layered Networks , 1988 .

[85] Michael C. Mozer,et al. Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[86] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[87] D. Broomhead,et al. Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[88] Lorien Y. Pratt,et al. Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[89] D. Rubin,et al. Statistical Analysis with Missing Data , 1988 .

[90] Erkki Oja,et al. Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[91] Roberto Battiti,et al. Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[92] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[93] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[94] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[95] Jean-Pierre Nadal,et al. Study of a Growth Algorithm for a Feedforward Network , 1989, Int. J. Neural Syst..

[96] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[97] A. Owens,et al. Efficient training of the backpropagation network by solving a system of stiff ordinary differential equations , 1989, International 1989 Joint Conference on Neural Networks.

[98] Tomaso A. Poggio,et al. Representation Properties of Networks: Kolmogorov's Theorem Is Irrelevant , 1989, Neural Computation.

[99] Hervé Bourlard,et al. A Continuous Speech Recognition System Embedding MLP into HMM , 1989, NIPS.

[100] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .

[101] Halbert White,et al. Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[102] Robert J. Schalkoff,et al. Digital Image Processing and Computer Vision , 1989 .

[103] Terence D. Sanger,et al. Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[104] Yaser S. Abu-Mostafa,et al. The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.

[105] Stephen F. Gull,et al. Developments in Maximum Entropy Data Analysis , 1989 .

[106] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[107] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[108] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[109] Keinosuke Fukunaga,et al. The Reduced Parzen Classifier , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[110] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..

[111] J. Nadal,et al. Learning in feedforward layered networks: the tiling algorithm , 1989 .

[112] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[113] D. N. Geary. Mixture Models: Inference and Applications to Clustering , 1989 .

[114] L. Spirkovska,et al. Rapid training of higher-order neural networks for invariant pattern recognition , 1989, International 1989 Joint Conference on Neural Networks.

[115] Geoffrey E. Hinton,et al. Dimensionality Reduction and Prior Knowledge in E-Set Recognition , 1989, NIPS.

[116] Sheng Chen,et al. Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[117] Stephen F. Gull,et al. Bayesian Data Analysis: Straight-line fitting , 1989 .

[118] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[119] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[120] R. Hecht-Nielsen,et al. Theory of the Back Propagation Neural Network , 1989 .

[121] H. White,et al. Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[122] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[123] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[124] David E. Rumelhart,et al. Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[125] M. Golea,et al. A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[126] W. Pitts,et al. A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[127] Kurt Hornik,et al. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[128] Donald F. Specht,et al. Probabilistic neural networks , 1990, Neural Networks.

[129] G. J. Gibson,et al. On the decision regions of multilayer perceptrons , 1990, Proc. IEEE.

[130] Neil E. Cotter,et al. The Stone-Weierstrass theorem and its application to neural networks , 1990, IEEE Trans. Neural Networks.

[131] R. T. Cox. Probability, frequency and reasonable expectation , 1990 .

[132] Bernard Widrow,et al. 30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[133] Halbert White,et al. Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[134] Kohji Fukunaga,et al. Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[135] Marcus Frean,et al. The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[136] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[137] R. Tibshirani,et al. Generalized Additive Models , 1991 .

[138] Alireza Khotanzad,et al. Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[139] Chuanyi Ji,et al. Generalizing Smoothness Constraints from Discrete Samples , 1990, Neural Computation.

[140] David Lowe,et al. The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[141] L. Jones. Constructive approximations for neural networks by sigmoidal functions , 1990, Proc. IEEE.

[142] Jooyoung Park,et al. Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[143] V. Tikhomirov. On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of a Smaller Number of Variables , 1991 .

[144] Shang-Liang Chen,et al. Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[145] Pietro Burrascano,et al. A norm selection criterion for the generalized delta rule , 1991, IEEE Trans. Neural Networks.

[146] Vladik Kreinovich,et al. Arbitrary nonlinearity is sufficient to represent all functions by neural networks: A theorem , 1991, Neural Networks.

[147] Etienne Barnard,et al. Invariance and neural nets , 1991, IEEE Trans. Neural Networks.

[148] Marwan A. Jabri,et al. Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks , 1991, Neural Comput..

[149] V. Kůrková. Kolmogorov's Theorem Is Relevant , 1991, Neural Comput..

[150] Hans G. C. Tråvén,et al. A neural network approach to statistical pattern classification by 'semiparametric' estimation of probability density functions , 1991, IEEE Trans. Neural Networks.

[151] Christopher M. Bishop,et al. A Fast Procedure for Retraining the Multilayer Perceptron , 1991, Int. J. Neural Syst..

[152] Yoshifusa Ito,et al. Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory , 1991, Neural Networks.

[153] Zhi-Quan Luo,et al. On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[154] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[155] John E. Moody,et al. The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[156] Barak A. Pearlmutter,et al. Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[157] David G. Lowe,et al. Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[158] M. Kramer. Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[159] Tomaso Poggio,et al. Computational vision and regularization theory , 1985, Nature.

[160] Richard Lippmann,et al. Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[161] Chris Bishop,et al. Improving the Generalization Properties of Radial Basis Function Neural Networks , 1991, Neural Computation.

[162] Richard Lippmann,et al. Improved Hidden Markov Models Speech Recognition Using Radial Basis Function Networks , 1991, NIPS.

[163] Wray L. Buntine,et al. Bayesian Back-Propagation , 1991, Complex Syst..

[164] J. Skilling. On Parameter Estimation and Quantified Maxent , 1991 .

[165] Edward K. Blum,et al. Approximation theory and feedforward networks , 1991, Neural Networks.

[166] P. GALLINARI,et al. On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[167] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[168] Jocelyn Sietsma,et al. Creating artificial neural networks that generalize , 1991, Neural Networks.

[169] Robert P. W. Duin,et al. Generalization capabilities of minimal kernel-based networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[170] Yann LeCun,et al. Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[171] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[172] Chris Bishop,et al. Current address: Microsoft Research, , 2022 .

[173] Paulo J. G. Lisboa,et al. Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers , 1992, IEEE Trans. Neural Networks.

[174] David H. Wolpert,et al. On the Use of Evidence in Neural Networks , 1992, NIPS.

[175] Etienne Barnard,et al. Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.

[176] L. Cooper,et al. When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[177] Eduardo Sontag,et al. For neural networks, function determines form , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[178] Uwe Hartmann,et al. Mapping neural network derived from the parzen window estimator , 1992, Neural Networks.

[179] David J. C. MacKay,et al. The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[180] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[181] L. Jones. A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[182] Vera Kurková,et al. Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[183] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[184] David J. C. MacKay,et al. Bayesian Interpolation , 1992, Neural Computation.

[185] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[186] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[187] D. W. Scott,et al. Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[188] Barak A. Pearlmutter,et al. Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[189] Roderick J. A. Little. Regression with Missing X's: A Review , 1992 .