论文信息 - Rewriting Meaningful Sentences via Conditional BERT Sampling and an application on fooling text classifiers

Rewriting Meaningful Sentences via Conditional BERT Sampling and an application on fooling text classifiers

Most adversarial attack methods that are designed to deceive a text classifier change the text classifier's prediction by modifying a few words or characters. Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inherent in sentence-level rephrasing as well as the problem of setting the criteria for legitimate rewriting. In this paper, we explore the problem of creating adversarial examples with sentence-level rewriting. We design a new sampling method, named ParaphraseSampler, to efficiently rewrite the original sentence in multiple ways. Then we propose a new criteria for modification, called a sentence-level threaten model. This criteria allows for both word- and sentence-level changes, and can be adjusted independently in two dimensions: semantic similarity and grammatical quality. Experimental results show that many of these rewritten sentences are misclassified by the classifier. On all 6 datasets, our ParaphraseSampler achieves a better attack success rate than our baseline.

Kalyan Veeramachaneni | Lei Xu | Ivan Ramirez

[1] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[2] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[3] Sameep Mehta,et al. Generating Adversarial Text Samples , 2018, ECIR.

[4] Catherine Wong,et al. DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation , 2017, ArXiv.

[5] Ananthram Swami,et al. Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[6] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[7] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[8] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[9] LiChenliang,et al. Adversarial Attacks on Deep-learning Models in Natural Language Processing , 2020 .

[10] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.