Maximum likelihood neural approximation in presence of additive colored noise

In many practical situations, the noise samples may be correlated. In this case, the estimation of noise parameters can be used to improve the approximation. Estimation of the noise structure can also be used to find a stopping criterion in constructive neural networks. To avoid overfitting, a network construction procedure must be stopped when residual can be considered as noise. The knowledge on the noise may be used for "whitening" the residual so that a correlation hypothesis test determines if the network growing must be continued or not. In this paper, supposing a Gaussian noise model, we study the problem of multi-output nonlinear regression using MLP when the noise in each output is a correlated autoregressive time series and is spatially correlated with other output noises. We show that the noise parameters can be determined simultaneously with the network weights and used to construct an estimator with a smaller variance, and so to improve the network generalization performance. Moreover, if a constructive procedure is used to build the network, the estimated parameters may be used to stop the procedure.

[1]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[2]  James T. Kwok,et al.  Constructive algorithms for structure learning in feedforward neural networks for regression problems , 1997, IEEE Trans. Neural Networks.

[3]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[4]  R. Galbraith,et al.  ON THE INVERSES OF SOME PATTERNED MATRICES ARISING IN THE THEORY OF STATIONARY TIME SERIES , 1974 .

[5]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[6]  H. Akaike A new look at the statistical model identification , 1974 .

[7]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[8]  J M Mallion,et al.  Toward a portable blood pressure recorder device equipped with an accelerometer. , 1999, Medical engineering & physics.

[9]  M. Tummala,et al.  Identification of Volterra systems with a polynomial neural network , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[11]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[12]  Jean-Pierre Le Cadre Parametric methods for spatial signal processing in the presence of unknown colored noise fields , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  James D. Hamilton Time Series Analysis , 1994 .

[14]  James T. Kwok,et al.  Objective functions for training new hidden units in constructive neural networks , 1997, IEEE Trans. Neural Networks.

[15]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[16]  Harry Wechsler,et al.  From Statistics to Neural Networks: Theory and Pattern Recognition Applications , 1996 .

[17]  C. A. Glasbey,et al.  Correlated Residuals in Non‐Linear Regression Applied to Growth Data , 1979 .

[18]  Christian Jutten,et al.  Improving neural network estimation in presence of non i.i.d. noise , 1998, The European Symposium on Artificial Neural Networks.

[19]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[20]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[21]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[22]  A. Ronald Gallant,et al.  Nonlinear Regression with Autocorrelated Errors , 1976 .

[23]  John J. Spitzer Small-Sample Properties of Nonlinear Least Squares and Maximum Likelihood Estimators in the Context of Autocorrelated Errors , 1979 .

[24]  Mats Viberg Sensitivity of parametric direction finding to colored noise fields and undermodeling , 1993, Signal Process..

[25]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[26]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.