Cross domain distribution adaptation via kernel mapping

When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of target-domain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10% higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.

[1]  Ian Witten,et al.  Data Mining , 2000 .

[2]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[3]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[4]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[5]  Evgeniy Gabrilovich,et al.  Parameterized generation of labeled datasets for text categorization based on a hierarchical directory , 2004, SIGIR '04.

[6]  Peng Zhang,et al.  Discriminant analysis: a unified approach , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[8]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[9]  Su-Yun Huang,et al.  Kernel Fisher ’ s Discriminant Analysis in Gaussian Reproducing Kernel , 2005 .

[10]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[11]  Ian Davidson,et al.  When Efficient Model Averaging Out-Performs Boosting and Bagging , 2006, PKDD.

[12]  Sunita Sarawagi,et al.  Domain Adaptation of Conditional Probability Models Via Feature Subsetting , 2007, PKDD.

[13]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[14]  Ian Davidson,et al.  On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples , 2007, SDM.

[15]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[16]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[17]  Motoaki Kawanabe,et al.  Asymptotic Bayesian generalization error when training and test distributions are different , 2007, ICML '07.

[18]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[19]  Stefan Kramer,et al.  Kernel-Based Inductive Transfer , 2008, ECML/PKDD.

[20]  François Laviolette,et al.  A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning , 2008, NIPS.

[21]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[22]  Philip S. Yu,et al.  Type-Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing , 2008, SDM.

[23]  Changshui Zhang,et al.  Transferred Dimensionality Reduction , 2008, ECML/PKDD.

[24]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.