论文信息 - Dataset Shift in Machine Learning

Dataset Shift in Machine Learning

Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of dataset shift, occurs when only the input distribution changes. Dataset shift is present in most practical applications, for reasons ranging from the bias introduced by experimental design to the irreproducibility of the testing conditions at training time. (An example is -email spam filtering, which may fail to recognize spam that differs in form from the spam the automatic filter has been built on.) Despite this, and despite the attention given to the apparently similar problems of semi-supervised learning and active learning, dataset shift has received relatively little attention in the machine learning community until recently. This volume offers an overview of current efforts to deal with dataset and covariate shift. The chapters offer a mathematical and philosophical introduction to the problem, place dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning, provide theoretical views of dataset and covariate shift (including decision theoretic and Bayesian perspectives), and present algorithms for covariate shift. Contributors: Shai Ben-David, Steffen Bickel, Karsten Borgwardt, Michael Brckner, David Corfield, Amir Globerson, Arthur Gretton, Lars Kai Hansen, Matthias Hein, Jiayuan Huang, Takafumi Kanamori, Klaus-Robert Mller, Sam Roweis, Neil Rubens, Tobias Scheffer, Marcel Schmittfull, Bernhard Schlkopf, Hidetoshi Shimodaira, Alex Smola, Amos Storkey, Masashi Sugiyama, Choon Hui Teo Neural Information Processing series

[1] N. Goodman. Fact, Fiction, and Forecast , 1955 .

[2] G. Pólya,et al. Mathematics and Plausible Reasoning , 1956 .

[3] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4] G. Wahba,et al. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[5] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .

[6] H. Akaike. A new look at the statistical model identification , 1974 .

[7] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[8] J. Heckman. Shadow prices, market wages, and labor supply , 1974 .

[9] Steven R. Lerman,et al. The Estimation of Choice Probabilities from Choice Based Samples , 1977 .

[10] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[11] Peter Craven,et al. Smoothing noisy data with spline functions , 1978 .

[12] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[13] J. Heckman. Sample selection bias as a specification error , 1979 .

[14] Lung-fei Lee. Some Approaches to the Correction of Selectivity Bias , 1982 .

[15] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[16] D. Rubin,et al. The central role of the propensity score in observational studies for causal effects , 1983 .

[17] P. Green. Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[18] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .

[19] John Law,et al. Robust Statistics—The Approach Based on Influence Functions , 1986 .

[20] C. Manski. Anatomy of the Selection Problem , 1989 .

[21] Jeffrey A. Dubin,et al. Selection Bias in Linear Regression, Logit and Probit Models , 1989 .

[22] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[23] H. James. VARIETIES OF SELECTION BIAS , 1990 .

[24] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[25] Chris J. Skinner,et al. Analysis of complex surveys , 1991 .

[26] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[27] H. Sebastian Seung,et al. Query by committee , 1992, COLT '92.

[28] Christopher Winship,et al. Models for Sample Selection Bias , 1992 .

[29] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .

[30] Noel A Cressie,et al. Statistics for Spatial Data. , 1992 .

[31] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[32] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[33] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.

[34] B. Lindsay. Efficiency versus robustness : the case for minimum Hellinger distance and related methods , 1994 .

[35] B. Lindsay,et al. Minimum disparity estimation for continuous models: Efficiency, distributions and robustness , 1994 .

[36] C. Field,et al. Robust Estimation - a Weighted Maximum-Likelihood Approach , 1994 .

[37] M. P. Windham. Robustifying Model Fitting , 1995 .

[38] Kenji Fukumizu,et al. Active Learning in Multilayer Perceptrons , 1995, NIPS.

[39] Harris Drucker,et al. Comparison of learning algorithms for handwritten digit recognition , 1995 .

[40] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[41] Herbert Gish,et al. Speaker identification via support vector classifiers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[42] M. Gibbs,et al. Efficient implementation of gaussian processes , 1997 .

[43] Federico Girosi,et al. Support Vector Machines: Training and Applications , 1997 .

[44] D. Haussler,et al. MUTUAL INFORMATION, METRIC ENTROPY AND CUMULATIVE RELATIVE ENTROPY RISK , 1997 .

[45] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[46] Naoki Abe,et al. Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[47] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[48] F. Vella. Estimating Models with Sample Selection Bias: A Survey , 1998 .

[49] Alexander J. Smola,et al. Learning with kernels , 1998 .

[50] H. Goldstein,et al. Weighting for unequal selection probabilities in multilevel models , 1998 .

[51] David Barber,et al. Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[52] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[53] Lars Kai Hansen,et al. Bayesian Averaging is Well-Temperated , 1999, NIPS.