Learn on Source, Refine on Target: A Model Transfer Learning Framework with Random Forests

We propose novel <italic>model transfer-learning</italic> methods that refine a decision forest model <inline-formula><tex-math notation="LaTeX">$M$</tex-math><alternatives> <inline-graphic xlink:href="segev-ieq1-2618118.gif"/></alternatives></inline-formula> learned within a “source” domain using a training set sampled from a “target” domain, assumed to be a variation of the source. We present two random forest transfer algorithms. The first algorithm searches greedily for locally optimal modifications of each tree structure by trying to locally expand or reduce the tree around individual nodes. The second algorithm does not modify structure, but only the parameter (thresholds) associated with decision nodes. We also propose to combine both methods by considering an ensemble that contains the union of the two forests. The proposed methods exhibit impressive experimental results over a range of problems.

[1]  Joachim Denzler,et al.  Learning with Few Examples by Transferring Feature Relevance , 2009, DAGM-Symposium.

[2]  Peter Stone,et al.  Boosting for Regression Transfer , 2010, ICML.

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[5]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[6]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[8]  Eric Eaton,et al.  Modeling Transfer Relationships Between Learning Tasks for Improved Inductive Transfer , 2008, ECML/PKDD.

[9]  Stefan Kramer,et al.  Kernel-Based Inductive Transfer , 2008, ECML/PKDD.

[10]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[11]  Jack Mostow,et al.  Direct Transfer of Learned Information Among Neural Networks , 1991, AAAI.

[12]  Hui Xiong,et al.  Transfer learning from multiple source domains via consensus regularization , 2008, CIKM '08.

[13]  Marta Mejail,et al.  Transfer Learning Decision Forests for Gesture Recognition , 2017, Gesture Recognition.

[14]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[15]  Dong Xu,et al.  Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[18]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[19]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[22]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[23]  Shotaro Akaho,et al.  TrBagg: A Simple Transfer Learning Method and its Application to Personalization in Collaborative Tagging , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[24]  Ivor W. Tsang,et al.  Domain Transfer SVM for video concept detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[26]  Rafael Morales Bueno,et al.  Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners , 2007, J. Mach. Learn. Res..

[27]  Pedro M. Domingos,et al.  Mining massive data streams , 2005 .

[28]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Masashi Sugiyama,et al.  Tree-Based Ensemble Multi-Task Learning Method for Classification and Regression , 2014, IEICE Trans. Inf. Syst..

[30]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[31]  Barbara Caputo,et al.  Multiclass transfer learning from unconstrained priors , 2011, 2011 International Conference on Computer Vision.

[32]  Joachim Denzler,et al.  Learning with Few Examples using a Constrained Gaussian Prior on Randomized Trees , 2008, VMV.

[33]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[34]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Anton Schwaighofer,et al.  Hierarchical Bayesian modelling with Gaus-sian processes , 2005, NIPS 2005.

[36]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[37]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[38]  Omer Levy,et al.  Teaching Machines to Learn by Metaphors , 2012, AAAI.

[39]  Koby Crammer,et al.  Multi-domain learning by confidence-weighted parameter combination , 2010, Machine Learning.

[40]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[41]  Jeff A. Bilmes,et al.  Recognizing Activities and Spatial Context Using Wearable Sensors , 2006, UAI.

[42]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[43]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[44]  Boris Chidlovskii,et al.  Learning Multiple Tasks with Boosted Decision Trees , 2012, ECML/PKDD.

[45]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[46]  Ivor W. Tsang,et al.  Domain Adaptation from Multiple Sources : A Domain-Dependent Regularization Approach , 2012 .

[47]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[48]  Achim Rettinger,et al.  Boosting Expert Ensembles for Rapid Concept Recall , 2006, AAAI.

[49]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[50]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[51]  Vladimir Martyanov,et al.  Transferring Knowledge by Prior Feature Sampling , 2008, FSDM.

[52]  Shih-Fu Chang,et al.  Cross-domain learning methods for high-level visual concept classification , 2008, 2008 15th IEEE International Conference on Image Processing.

[53]  Barbara Caputo,et al.  Safety in numbers: Learning categories from few examples with multi model knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[55]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[56]  Boris Chidlovskii,et al.  Boosting Multi-Task Weak Learners with Applications to Textual and Social Data , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[57]  Ivor W. Tsang,et al.  Visual event recognition in videos by learning from web data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[59]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[60]  Joachim Denzler,et al.  Learning with few examples for binary and multiclass classification using regularization of randomized trees , 2011, Pattern Recognit. Lett..

[61]  Sebastian Thrun,et al.  Learning One More Thing , 1994, IJCAI.

[62]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[63]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[64]  Cha Zhang,et al.  Ensemble Machine Learning , 2012 .

[65]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[66]  Zhongqi Lu,et al.  Selective Transfer Learning for Cross Domain Recommendation , 2012, SDM.

[67]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[68]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[69]  Maayan Harel,et al.  Learning from Multiple Outlooks , 2010, ICML.

[70]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[71]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[72]  Yi Yao,et al.  Boosting for transfer learning with multiple sources , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[73]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[74]  Christophe G. Giraud-Carrier,et al.  Transfer Learning in Decision Trees , 2007, 2007 International Joint Conference on Neural Networks.

[75]  Kumar Chellapilla,et al.  Personalized handwriting recognition via biased regularization , 2006, ICML.

[76]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[77]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[78]  Ruoming Jin,et al.  Efficient decision tree construction on streaming data , 2003, KDD '03.

[79]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[80]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[81]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[82]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[84]  Srinivasan Parthasarathy,et al.  Proceedings of the 2013 SIAM International Conference on Data Mining , 2013 .

[85]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[86]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[87]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .