论文信息 - Prediction and outlier detection in classification problems

Prediction and outlier detection in classification problems

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set $C(x)$ as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers $x$, for which the method returns no prediction (corresponding to $C(x)$ equal to the empty set). The proposed method combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given method. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

Leying Guan | R. Tibshirani | Leying Guan | Rob Tibshirani

[1] Bruno Pelletier,et al. Clustering by estimation of density level sets at a fixed probability , 2009 .

[2] S. S. Wilks. Determination of Sample Sizes for Setting Tolerance Limits , 1941 .

[3] Radu Herbei,et al. Classification with reject option , 2006 .

[4] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[5] Alexander J. Smola,et al. Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[6] S. Geer. HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[7] Jing Lei. Classification with confidence , 2014 .

[8] J. Robins,et al. Distribution-Free Prediction Sets , 2013, Journal of the American Statistical Association.

[9] A. Gammerman,et al. On-line predictive linear regression , 2005, math/0511522.

[10] Larry Wasserman,et al. Distribution‐free prediction bands for non‐parametric regression , 2014 .

[11] B. Cadre. Kernel estimation of density level sets , 2005, math/0501221.

[12] Shoutir Kishore Chatterjee,et al. Asymptotically Minimal Multivariate Tolerance Sets , 1980 .

[13] Emmanuel J. Candès,et al. Conformal Prediction Under Covariate Shift , 2019, NeurIPS.

[14] Barnabás Póczos,et al. Cautious Deep Learning , 2018, ArXiv.

[15] Peter L. Bartlett,et al. Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[16] W. Gasarch,et al. The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[17] Jun Li,et al. Multivariate spacings based on data depth: I. Construction of nonparametric multivariate tolerance regions , 2008, 0806.2970.

[18] Victoria J. Hodge,et al. A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[19] Abraham Wald,et al. An Extension of Wilks' Method for Setting Tolerance Limits , 1943 .

[20] Salvatore J. Stolfo,et al. Cost-based modeling for fraud and intrusion detection: results from the JAM project , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[21] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[22] Heinrich Jiang,et al. Uniform Convergence Rates for Kernel Density Estimation , 2017, ICML.

[23] P. Rigollet,et al. Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[24] John A. Hartigan,et al. Clustering Algorithms , 1975 .

[25] Bernhard Schölkopf,et al. Domain Adaptation under Target and Conditional Shift , 2013, ICML.