Abstract Twitter, a popular social network, presents great opportunities for on-line machine learning research. However, previous research hasfocused almost entirely on learning from passively collected data. We study the problem of learning to acquire followers throughnormative user behavior, as opposed to the mass following policies applied by many bots. We formalize the problem as a contextualbandit problem, in which we consider retweeting content to be the action chosen and each tweet (content) is accompanied by context.We design reward signals based on the change in followers. The result of our month long experiment with 60 agents suggests that (1)aggregating experience across agents can adversely impact prediction accuracy and (2) the Twitter community’s response to differentactions is non-stationary. Our findings suggest that actively learning on-line can provide deeper insights about how to attract followersthan machine learning over passively collected data alone.Keywords: Reinforcement Learning, On-line Learning, Contextual Bandits, TwitterAcknowledgementsThe research leading to these results has received funding from the European Research Council under the European Unions Seventh
[1]
T. L. Lai Andherbertrobbins.
Asymptotically Efficient Adaptive Allocation Rules
,
2022
.
[2]
Manfred Huber,et al.
Learning from Reinforcement and Advice Using Composite Reward Functions
,
2003,
FLAIRS Conference.
[3]
Yoram Singer,et al.
Data-Driven Online to Batch Conversions
,
2005,
NIPS.
[4]
Wei Chu,et al.
A contextual-bandit approach to personalized news article recommendation
,
2010,
WWW '10.
[5]
Ed H. Chi,et al.
Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network
,
2010,
2010 IEEE Second International Conference on Social Computing.