Local Dimensionality Reduction

If globally high dimensional data has locally only low dimensional distributions, it is advantageous to perform a local dimensionality reduction before further processing the data. In this paper we examine several techniques for local dimensionality reduction in the context of locally weighted linear regression. As possible candidates, we derive local versions of factor analysis regression, principle component regression, principle component regression on joint distributions, and partial least squares regression. After outlining the statistical bases of these methods, we perform Monte Carlo simulations to evaluate their robustness with respect to violations of their statistical assumptions. One surprising outcome is that locally weighted partial least squares regression offers the best average results, thus outperforming even factor analysis, the theoretically most appealing of our candidate techniques.

[1]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[2]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[3]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[4]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[5]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[6]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[7]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[8]  R. Treiman,et al.  Are there onset- and rime-like units in printed words? , 1987 .

[9]  James L. McClelland,et al.  Conspiracy effects in word pronunciation. , 1987 .

[10]  M. Coltheart Attention and Performance XII: The Psychology of Reading , 1987 .

[11]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: part 1.: an account of basic findings , 1988 .

[12]  James L. McClelland,et al.  A distributed, developmental model of word recognition and naming. , 1989, Psychological review.

[13]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[14]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[15]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[16]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[17]  Paul W. B. Atkins,et al.  Models of reading aloud: Dual-route and parallel-distributed-processing approaches. , 1993 .

[18]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[19]  M. Coltheart,et al.  Serial processing in reading aloud: Evidence for dual-route models of reading. , 1994 .

[20]  R. Peereman Naming regular and exception words: Further examination of the effect of phonological dissension among lexical neighbours , 1995 .

[21]  James L. McClelland,et al.  Understanding normal and impaired word reading: computational principles in quasi-regular domains. , 1996, Psychological review.

[22]  Stefan Schaal,et al.  Local dimensionality reduction for locally weighted learning , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[23]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.