Clickbait Convolutional Neural Network

With the development of online advertisements, clickbait spread wider and wider. Clickbait dissatisfies users because the article content does not match their expectation. Thus, clickbait detection has attracted more and more attention recently. Traditional clickbait-detection methods rely on heavy feature engineering and fail to distinguish clickbait from normal headlines precisely because of the limited information in headlines. A convolutional neural network is useful for clickbait detection, since it utilizes pretrained Word2Vec to understand the headlines semantically, and employs different kernels to find various characteristics of the headlines. However, different types of articles tend to use different ways to draw users’ attention, and a pretrained Word2Vec model cannot distinguish these different ways. To address this issue, we propose a clickbait convolutional neural network (CBCNN) to consider not only the overall characteristics but also specific characteristics from different article types. Our experimental results show that our method outperforms traditional clickbait-detection algorithms and the TextCNN model in terms of precision, recall and accuracy.

[1]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2]  Shibamouli Lahiri,et al.  Informality Judgment at Sentence Level and Experiments with Formality Score , 2011, CICLing.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Wojciech Czarnecki,et al.  On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[5]  Tanmoy Chakraborty,et al.  We Used Neural Networks to Detect Clickbaits: You Won't Believe What Happened Next! , 2016, ECIR.

[6]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[9]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[10]  Amol Agrawal,et al.  Clickbait detection using deep learning , 2016, 2016 2nd International Conference on Next Generation Computing Technologies (NGCT).

[11]  Jun Zhao,et al.  Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks , 2015, ACL.

[12]  Shu-Tao Xia,et al.  Boost Clickbait Detection Based on User Behavior Analysis , 2017, APWeb/WAIM.

[13]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[14]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[15]  Katarzyna Molek-Kozakowska,et al.  CoerCive Metaphors in news headlines: a Cognitive-pragMatiC approaCh , 2014 .

[16]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[17]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[18]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[19]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[20]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[21]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[22]  Hsinchun Chen,et al.  A comparison of fraud cues and classification methods for fake escrow website detection , 2009, Inf. Technol. Manag..

[23]  Niloy Ganguly,et al.  Stop Clickbait: Detecting and preventing clickbaits in online news media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Matthias Hagen,et al.  Clickbait Detection , 2016, ECIR.

[26]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[27]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[28]  Yimin Chen,et al.  Misleading Online Content: Recognizing Clickbait as "False News" , 2015, WMDD@ICMI.

[29]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Jay F. Nunamaker,et al.  Detecting Fake Websites: The Contribution of Statistical Learning Theory , 2010, MIS Q..

[32]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[33]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[34]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[35]  Prakhar Biyani,et al.  "8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality , 2016, AAAI.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[38]  Wang Yu,et al.  Research of Title Party News Identification Technology Based on Topic Sentence Similarity , 2012 .