Differentially Private Decoding in Large Language Models

Recent large-scale natural language processing (NLP) systems use a pre-trained Large Language Model (LLM) on massive and diverse corpora as a headstart. In practice, the pre-trained model is adapted to a wide array of tasks via fine-tuning on task-specific datasets. LLMs, while effective, have been shown to memorize instances of training data thereby potentially revealing private information pro-cessed during pre-training. The potential leak-age might further propagate to the downstream tasks for which LLMs are fine-tuned. On the other hand, privacy-preserving algorithms usually involve retraining from scratch, which is prohibitively expensive for LLMs. In this work, we propose a simple, easy to interpret, and computationally lightweight perturbation mechanism to be applied to an already trained model at the decoding stage. Our perturbation mechanism is model-agnostic and can be used in conjunction with any LLM. We provide theoretical analysis showing that the proposed mechanism is differentially private, and experimental results showing a privacy-utility trade-off.

[1]  Quantifying Memorization Across Neural Language Models , 2022, ArXiv.

[2]  Tiago Pimentel,et al.  Typical Decoding for Natural Language Generation , 2022, ArXiv.

[3]  Laurens van der Maaten,et al.  Submix: Practical Private Prediction for Large-Scale Language Models , 2022, ArXiv.

[4]  Huseyin A. Inan,et al.  Differentially Private Fine-tuning of Language Models , 2021, ICLR.

[5]  Tatsunori B. Hashimoto,et al.  Large Language Models Can Be Strong Differentially Private Learners , 2021, ICLR.

[6]  Anna Rumshisky,et al.  An Efficient DP-SGD Mechanism for Large Scale NLU Models , 2021, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Daphne Ippolito,et al.  Counterfactual Memorization in Neural Language Models , 2021, ArXiv.

[8]  Hong Yu,et al.  Membership Inference Attack Susceptibility of Clinical Language Models , 2021, ArXiv.

[9]  Vitaly Feldman,et al.  When is memorization of irrelevant training data necessary for high-accuracy learning? , 2020, STOC.

[10]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[11]  Vitaly Feldman,et al.  What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.

[12]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[13]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[14]  PAC learning with stable and private predictions , 2019, COLT 2020.

[15]  M. Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[16]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[17]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[18]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[21]  Somesh Jha,et al.  The Unintended Consequences of Overfitting: Training Data Inference Attacks , 2017, ArXiv.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[24]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[25]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[26]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[28]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.