With the explosion of user-generated web2.0 content in the form of blogs, wikis and discussion forums, the Internet has rapidly become a massive dynamic repository of public opinion on an unbounded range of topics. A key enabler of opinion extraction and summarization is sentiment classification: the task of automatically identifying whether a given piece of text expresses positive or negative opinion towards a topic of interest. Building high-quality sentiment classifiers using standard text categorization methods is challenging due to the lack of labeled data in a target domain. In this paper, we consider the problem of cross-domain sentiment analysis: can one, for instance, download rated movie reviews from rottentomatoes.com or IMBD discussion forums, learn linguistic expressions and sentiment-laden terms that generally characterize opinionated reviews and then successfully transfer this knowledge to the target domain, thereby building high-quality sentiment models without manual effort? We outline a novel sentiment transfer mechanism based on constrained non-negative matrix tri-factorizations of term-document matrices in the source and target domains. We report some preliminary results with this approach.
[1]
H. Sebastian Seung,et al.
Algorithms for Non-negative Matrix Factorization
,
2000,
NIPS.
[2]
Koby Crammer,et al.
Analysis of Representations for Domain Adaptation
,
2006,
NIPS.
[3]
Chris H. Q. Ding,et al.
Orthogonal nonnegative matrix t-factorizations for clustering
,
2006,
KDD '06.
[4]
Lillian Lee,et al.
Opinion Mining and Sentiment Analysis
,
2008,
Found. Trends Inf. Retr..
[5]
John Blitzer,et al.
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
,
2007,
ACL.