Multitask Learning

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. This paper reviews prior work on MTL, presents new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals, and presents new results for MTL with k-nearest neighbor and kernel regression. In this paper we demonstrate multitask learning in three domains. We explain how multitask learning works, and show that there are many opportunities for multitask learning in real domains. We present an algorithm and results for multitask learning with case-based methods like k-nearest neighbor and kernel regression, and sketch an algorithm for multitask learning in decision trees. Because multitask learning works, can be applied to many different kinds of domains, and can be used with different learning algorithms, we conjecture there will be many opportunities for its use on real-world problems.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[3]  Douglas H. Fisher,et al.  Conceptual Clustering, Learning from Examples, and Inference , 1987 .

[4]  Paul E. Utgoff,et al.  Learning a Preference Predicate , 1987 .

[5]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[6]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[7]  Chris Chatfield,et al.  19. Statistical Analysis with Missing Data , 1988 .

[8]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[9]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Geoffrey E. Hinton Learning distributed representations of concepts. , 1989 .

[11]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[12]  Thomas G. Dietterich,et al.  A Comparative Study of ID3 and Backpropagation for English Text-to-Speech Mapping , 1990, ML.

[13]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[14]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[15]  S. C. Suddarth,et al.  Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.

[16]  Jack Mostow,et al.  Direct Transfer of Learned Information Among Neural Networks , 1991, AAAI.

[17]  Steven C. Suddarth,et al.  Symbolic-Neural Systems and the Use of Hints for Developing Complex Systems , 1991, Int. J. Man Mach. Stud..

[18]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[19]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[20]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[21]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[22]  Tom M. Mitchell,et al.  A Personal Learning Apprentice , 1992, AAAI.

[23]  M. Fine,et al.  Validation of a pneumonia prognostic index using the MedisGroups Comparative Hospital Database. , 1993, The American journal of medicine.

[24]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[25]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[26]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[27]  Lorien Y. Pratt,et al.  Non-literal Transfer Among Neural Network Learners , 1993 .

[28]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[29]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[30]  Yaser S. Abu-Mostafa,et al.  Hints and the VC Dimension , 1993, Neural Computation.

[31]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[32]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[33]  Raúl E. Valdés-Pérez,et al.  A Powerful Heuristic for the Discovery of Complex Patterned Behaviour , 1994, ICML.

[34]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[35]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[36]  Jude W. Shavlik,et al.  Using Sampling and Queries to Extract Rules from Trained Neural Networks , 1994, ICML.

[37]  Stephen M. Omohundro Family Discovery , 1995, NIPS.

[38]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[39]  Sebastian Thrun,et al.  Lifelong Learning: A Case Study. , 1995 .

[40]  Sebastian Thrun,et al.  Learning One More Thing , 1994, IJCAI.

[41]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[42]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[43]  Sebastian Thrun,et al.  Explanation-based neural network learning a lifelong learning approach , 1995 .

[44]  Shumeet Baluja,et al.  Using the Representation in a Neural Network's Hidden Layer for Task-Specific Focus of Attention , 1995, IJCAI.

[45]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[46]  Anthony Stentz,et al.  Sensor fusion for autonomous outdoor navigation using neural networks , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[47]  Yaser S. Abu-Mostafa,et al.  Hints , 2018, Neural Computation.

[48]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[49]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[50]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[51]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[52]  Rich Caruana,et al.  Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs , 1996, NIPS.

[53]  Sebastian Thrun,et al.  Explanation-based neural network learning , 1996 .

[54]  Jonathan Baxter,et al.  A Bayesian/information theoretic model of bias learning , 2019, COLT '96.

[55]  Joseph Sill,et al.  Monotonicity Hints , 1996, NIPS.

[56]  Paul W. Munro,et al.  Competition Among Networks Improves Committee Performance , 1996, NIPS.

[57]  Yoshua Bengio,et al.  Multi-Task Learning for Stock Selection , 1996, NIPS.

[58]  Shumeet Baluja,et al.  Expectation-based selective attention , 1996 .

[59]  Alexander Filatov,et al.  Handwritten ZIP code recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[60]  Constantin F. Aliferis,et al.  An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[61]  Michael I. Jordan,et al.  Mixture models for learning from incomplete data , 1997, Annual Conference Computational Learning Theory.

[62]  L. Hockey Learning from time. , 1997, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[63]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[64]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[65]  Joel D. Martin,et al.  Acquiring and Combining Overlapping Concepts , 2004, Machine Learning.

[66]  Thomas G. Dietterich,et al.  A Comparison of ID3 and Backpropagation for English Text-To-Speech Mapping , 2004, Machine Learning.

[67]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[68]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[69]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[70]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[71]  Thomas G. Dietterich,et al.  A comparison of ID3 and backpropagation for English text-to-speech mapping , 2004, Machine Learning.

[72]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[73]  Joel D. Martin Goal-directed clustering , 2022 .