Architecture Selection Strategies for Neural Networks: Application to Corporate Bond Rating Predicti

We propose strategies for selecting a good neural network architecture for modeling any spe-ciic data set. Our approach involves eeciently searching the space of possible architectures and selecting a \best" architecture based on estimates of generalization performance. Since an exhaustive search over the space of architectures is computationally infeasible, we propose heuristic strategies which dramatically reduce the search complexity. These employ directed search algorithms, including selecting the number of nodes via sequential network construction (SNC), sensitivity based pruning (SBP) of inputs, and optimal brain damage (OBD) pruning for weights. A selection criterion, the estimated generalization performance or prediction risk, is used to guide the heuristic search and to choose the nal network. Both predicted squared error (PSE) and nonlinear cross{validation (NCV) are used for estimating the prediction risk from the available data. We apply these heuristic search and prediction risk estimation techniques to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by a limited set of data and by the lack of a complete a priori model which could be used to impose a structure to the network architecture.

[1]  T. Pogue,et al.  What's in a Bond Rating , 1969, Journal of Financial and Quantitative Analysis.

[2]  R. R. West An Alternative Approach to Predicting Corporate Bond Ratings , 1970 .

[3]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[4]  G. E. Pinches,et al.  A MULTIVARIATE ANALYSIS OF INDUSTRIAL BOND RATINGS , 1973 .

[5]  H. Akaike A new look at the statistical model identification , 1974 .

[6]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[7]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[8]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[9]  Hirotugu Akaike,et al.  On entropy maximization principle , 1977 .

[10]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[11]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[12]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[13]  A. Buse The Likelihood Ratio, Wald, and Lagrange Multiplier Tests: An Expository Note , 1982 .

[14]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[15]  John W. Peavy,et al.  The AT&T divestiture: Effect of rating changes on bond returns , 1986 .

[16]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[17]  Soumitra Dutta,et al.  Bond rating: A non-conservative application of neural networks , 1988 .

[18]  John E. Moody,et al.  Fast Learning in Multi-Resolution Hierarchies , 1988, NIPS.

[19]  B. Yandell Spline smoothing and nonparametric regression , 1989 .

[20]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[21]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[22]  G. Wahba Spline models for observational data , 1990 .

[23]  Alvin J. Surkan,et al.  Neural networks for bond rating improved by multiple hidden layers , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[24]  Norman Yarvin,et al.  Networks with Learned Unit Response Functions , 1991, NIPS.

[25]  John E. Moody,et al.  Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction , 1991, NIPS.

[26]  S. Garavaglia,et al.  An application of a counter-propagation neural network: simulating the Standard and Poor's Corporate Bond Rating system , 1991, Proceedings First International Conference on Artificial Intelligence Applications on Wall Street.

[27]  John Moody,et al.  Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[28]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[29]  J. Utans,et al.  Selecting neural network architectures via the prediction risk: application to corporate bond rating prediction , 1991, Proceedings First International Conference on Artificial Intelligence Applications on Wall Street.

[30]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[31]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[32]  John E. Moody,et al.  Fast Pruning Using Principal Components , 1993, NIPS.

[33]  Achilleas Zapranis,et al.  Stock performance modeling using neural networks: A comparative study with regression models , 1994, Neural Networks.

[34]  A. Refenes Neural Networks in the Capital Markets , 1994 .