Coping with Label Shift via Distributionally Robust Optimisation

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match. Existing work addressing label shift usually assumes access to an \emph{unlabelled} test sample. This sample may be used to estimate the test label distribution, and to then train a suitably re-weighted classifier. While approaches using this idea have proven effective, their scope is limited as it is not always feasible to access the target domain; further, they require repeated retraining if the model is to be deployed in \emph{multiple} test environments. Can one instead learn a \emph{single} classifier that is robust to arbitrary label shifts from a broad family? In this paper, we answer this question by proposing a model that minimises an objective based on distributionally robust optimisation (DRO). We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective. %, and establish its convergence. Finally, through experiments on CIFAR-100 and ImageNet, we show that our technique can significantly improve performance over a number of baselines in settings where label shift is present.

[1]  Marcus Rohrbach,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.

[2]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[3]  Sivaraman Balakrishnan,et al.  A Unified View of Label Shift Estimation , 2020, NeurIPS.

[4]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[5]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[6]  Alexander Shapiro,et al.  Lectures on Stochastic Programming - Modeling and Theory, Second Edition , 2014, MOS-SIAM Series on Optimization.

[7]  Dacheng Tao,et al.  LTF: A Label Transformation Framework for Correcting Label Shift , 2020, ICML.

[8]  John Duchi,et al.  Distributionally Robust Losses for Latent Covariate Mixtures , 2020, ArXiv.

[9]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[10]  Gang Niu,et al.  Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[11]  Percy Liang,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[12]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[13]  C. Manski,et al.  The Logit Model and Response-Based Samples , 1989 .

[14]  John C. Duchi,et al.  Learning Models with Uniform Performance via Distributionally Robust Optimization , 2018, ArXiv.

[15]  Kamyar Azizzadenesheli,et al.  Regularized Learning for Domain Adaptation under Label Shifts , 2019, ICLR.

[16]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[17]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[18]  Heinz H. Bauschke,et al.  Fixed-Point Algorithms for Inverse Problems in Science and Engineering , 2011, Springer Optimization and Its Applications.

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[21]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[22]  Masashi Sugiyama,et al.  Mixture Regression for Covariate Shift , 2006, NIPS.

[23]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[24]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.

[25]  Wouter M. Kouw An introduction to domain adaptation and transfer learning , 2018, ArXiv.

[26]  Zachary C. Lipton,et al.  What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.

[27]  Jesús Cid-Sueiro,et al.  Minimax Regret Classifier for Imprecise Class Distributions , 2007, J. Mach. Learn. Res..

[28]  Richard G. Baraniuk,et al.  Tuning Support Vector Machines for Minimax and Neyman-Pearson Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  R. C. Williamson,et al.  Fairness risk measures , 2019, ICML.

[30]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[31]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[32]  Stefanie Jegelka,et al.  Adaptive Sampling for Stochastic Risk-Averse Learning , 2019, NeurIPS.

[33]  Qi Qi,et al.  An Online Method for Distributionally Deep Robust Optimization , 2020 .

[34]  Yair Carmon,et al.  Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[35]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[36]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[37]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[38]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[39]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[40]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[41]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Elena Smirnova,et al.  Distributionally Robust Counterfactual Risk Minimization , 2019, AAAI.

[43]  Sanjay Mehrotra,et al.  Distributionally Robust Optimization: A Review , 2019, ArXiv.

[44]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[45]  Mingrui Liu,et al.  Weakly-convex–concave min–max optimization: provable algorithms and applications in machine learning , 2018, Optim. Methods Softw..

[46]  Bao-Gang Hu,et al.  Learning with Average Top-k Loss , 2017, NIPS.

[47]  Geoffrey J. Gordon,et al.  Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift , 2020, NeurIPS.

[48]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[49]  Prateek Jain,et al.  Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.