A Survey on Transfer Learning

A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

[1]  Raymond J. Mooney,et al.  Theory Refinement of Bayesian Networks with Hidden Variables , 1998, ICML.

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[5]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[6]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[7]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[8]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[9]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[10]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[11]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[12]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[13]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[14]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[15]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[16]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[17]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[18]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[19]  Qiang Yang,et al.  Adaptive Temporal Radio Maps for Indoor Location Estimation , 2005, Third IEEE International Conference on Pervasive Computing and Communications.

[20]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[21]  Philip S. Yu,et al.  An improved categorization of classifier's sensitivity on sample selection bias , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[22]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[23]  Philip S. Yu,et al.  Efficient classification across multiple database relations: a CrossMine approach , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  Xindong Wu,et al.  Class Noise Handling for Effective Cost-Sensitive Learning by Cost-Guided Iterative Classification Filtering , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Rajat Raina,et al.  Constructing informative priors using transfer learning , 2006, ICML.

[26]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[27]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[28]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[29]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[30]  Hisham Al-Mubaid,et al.  A New Text Categorization Technique Using Distributional Clustering and Learning Logic , 2006, IEEE Transactions on Knowledge and Data Engineering.

[31]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[32]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[33]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[34]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[35]  Qiang Yang,et al.  Test-cost sensitive classification on data with missing values , 2006, IEEE Transactions on Knowledge and Data Engineering.

[36]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[37]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[38]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[39]  Yong Yu,et al.  Bridged Refinement for Transfer Learning , 2007, PKDD.

[40]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[41]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[42]  M. M. Hassan Mahmud,et al.  Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations , 2007, NIPS.

[43]  Qiang Yang,et al.  Adaptive Localization in a Dynamic WiFi Environment through Multi-view Learning , 2007, AAAI.

[44]  Miroslav Kubat,et al.  Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study , 2007, IEEE Transactions on Knowledge and Data Engineering.

[45]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[46]  Kurt Driessens,et al.  Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling , 2007, ECML.

[47]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[48]  Franco Turini,et al.  Time-Annotated Sequences for Medical Data Mining , 2007 .

[49]  Juan José Rodríguez Diez,et al.  Classifier Ensembles with a Random Linear Oracle , 2007, IEEE Transactions on Knowledge and Data Engineering.

[50]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[51]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[52]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[53]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[54]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[55]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[56]  Ramesh Nallapati,et al.  A Comparative Study of Methods for Transductive Transfer Learning , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[57]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[58]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[59]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[60]  Peter Stone,et al.  Graph-Based Domain Mapping for Transfer Learning in General Games , 2007, ECML.

[61]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[62]  Raymond J. Mooney,et al.  Transfer Learning by Mapping with Minimal Target Data , 2008 .

[63]  Murat Dundar,et al.  Bayesian multiple instance learning: automatic feature selection and inductive transfer , 2008, ICML '08.

[64]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[65]  Qiang Yang,et al.  Estimating Location Using Wi-Fi , 2008, IEEE Intelligent Systems.

[66]  Stefan Kramer,et al.  Kernel-Based Inductive Transfer , 2008, ECML/PKDD.

[67]  Massimiliano Pontil,et al.  An Algorithm for Transfer Learning in a Heterogeneous Environment , 2008, ECML/PKDD.

[68]  Qiang Yang,et al.  Transferring Localization Models across Space , 2008, AAAI.

[69]  Qiang Yang,et al.  Transferring Localization Models over Time , 2008, AAAI.

[70]  Qiang Yang,et al.  Can chinese web pages be classified with english data source? , 2008, WWW.

[71]  Qiang Yang,et al.  Transferring Knowledge from Another Domain for Learning Action Models , 2008, PRICAI.

[72]  Elena Baralis,et al.  A Lazy Approach to Associative Classification , 2008, IEEE Transactions on Knowledge and Data Engineering.

[73]  Sridhar Mahadevan,et al.  Manifold alignment using Procrustes analysis , 2008, ICML '08.

[74]  Wei Fan,et al.  Actively Transfer Domain Knowledge , 2008, ECML/PKDD.

[75]  Qiang Yang,et al.  Self-taught clustering , 2008, ICML '08.

[76]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[77]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[78]  Qiang Yang,et al.  Spectral domain-transfer learning , 2008, KDD.

[79]  Qiang Yang,et al.  Transferring Multi-device Localization Models using Latent Multi-task Learning , 2008, AAAI.

[80]  Eric Eaton,et al.  Modeling Transfer Relationships Between Learning Tasks for Improved Inductive Transfer , 2008, ECML/PKDD.

[81]  Hao Hu,et al.  Transfer learning for WiFi-based indoor localization , 2008, AAAI 2008.

[82]  Changshui Zhang,et al.  Transferred Dimensionality Reduction , 2008, ECML/PKDD.

[83]  Pedro M. Domingos,et al.  Deep transfer via second-order Markov logic , 2009, ICML '09.

[84]  Qiang Yang,et al.  Transfer learning for collaborative filtering via a rating-matrix generative model , 2009, ICML '09.

[85]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[86]  Qiang Yang,et al.  Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction , 2009, IJCAI.

[87]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[88]  S. Zacks,et al.  Journal of Statistical Planning and Inference , 2016 .