Universal Learning over Related Distributions and Adaptive Graph Transduction

The basis assumption that "training and test data drawn from the same distribution" is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under "different but related distributions" in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15% in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10% in accuracy, several state-of-art approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet. The source code and datasets are available from the authors.

[1]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[2]  Ian Davidson,et al.  On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples , 2007, SDM.

[3]  Qiang Yang,et al.  Spectral domain-transfer learning , 2008, KDD.

[4]  François Laviolette,et al.  A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning , 2008, NIPS.

[5]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[7]  Charu C. Aggarwal,et al.  On Density Based Transforms for Uncertain Data Mining , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Philip S. Yu,et al.  Type-Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing , 2008, SDM.

[9]  L. Ghaoui,et al.  Robust Classification with Interval Data , 2003 .

[10]  Shih-Fu Chang,et al.  Graph transduction via alternating minimization , 2008, ICML '08.

[11]  Fabio Roli,et al.  Multiple Classifier Systems, 9th International Workshop, MCS 2010, Cairo, Egypt, April 7-9, 2010. Proceedings , 2010, MCS.

[12]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .

[13]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[14]  Ian Witten,et al.  Data Mining , 2000 .

[15]  Raymond J. Mooney,et al.  Experiments on Ensembles with Missing and Noisy Data , 2004, Multiple Classifier Systems.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[18]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.