Domain Adaptation with Multiple Sources

This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most ∊ are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ∊ with respect to any target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3∊. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Robert L. Mercer,et al.  Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[3]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[4]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[5]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[6]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Aleix M. Martínez,et al.  Recognizing Imprecisely Localized, Partially Occluded, and Expression Variant Faces from a Single Sample per Class , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Brian Roark,et al.  Supervised and unsupervised PCFG adaptation to novel domains , 2003, NAACL.

[10]  Koby Crammer,et al.  Learning from Data of Variable Quality , 2005, NIPS.

[11]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[12]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[13]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[14]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[15]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[16]  John Blitzer,et al.  Frustratingly Hard Domain Adaptation for Dependency Parsing , 2007, EMNLP.

[17]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.