Prediction and outlier detection in classification problems

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set $C(x)$ as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers $x$, for which the method returns no prediction (corresponding to $C(x)$ equal to the empty set). The proposed method combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given method. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

[1]  Bruno Pelletier,et al.  Clustering by estimation of density level sets at a fixed probability , 2009 .

[2]  S. S. Wilks Determination of Sample Sizes for Setting Tolerance Limits , 1941 .

[3]  Radu Herbei,et al.  Classification with reject option , 2006 .

[4]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[5]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[6]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[7]  Jing Lei Classification with confidence , 2014 .

[8]  J. Robins,et al.  Distribution-Free Prediction Sets , 2013, Journal of the American Statistical Association.

[9]  A. Gammerman,et al.  On-line predictive linear regression , 2005, math/0511522.

[10]  Larry Wasserman,et al.  Distribution‐free prediction bands for non‐parametric regression , 2014 .

[11]  B. Cadre Kernel estimation of density level sets , 2005, math/0501221.

[12]  Shoutir Kishore Chatterjee,et al.  Asymptotically Minimal Multivariate Tolerance Sets , 1980 .

[13]  Emmanuel J. Candès,et al.  Conformal Prediction Under Covariate Shift , 2019, NeurIPS.

[14]  Barnabás Póczos,et al.  Cautious Deep Learning , 2018, ArXiv.

[15]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[16]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[17]  Jun Li,et al.  Multivariate spacings based on data depth: I. Construction of nonparametric multivariate tolerance regions , 2008, 0806.2970.

[18]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[19]  Abraham Wald,et al.  An Extension of Wilks' Method for Setting Tolerance Limits , 1943 .

[20]  Salvatore J. Stolfo,et al.  Cost-based modeling for fraud and intrusion detection: results from the JAM project , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[21]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[22]  Heinrich Jiang,et al.  Uniform Convergence Rates for Kernel Density Estimation , 2017, ICML.

[23]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[24]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[25]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.