Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
暂无分享,去创建一个
[1] Mehdi Abbana Bennani,et al. Randomized Positional Encodings Boost Length Generalization of Transformers , 2023, ACL.
[2] David Sussillo,et al. Flexible multitask computation in recurrent networks utilizes shared dynamical motifs , 2022, bioRxiv.
[3] James L. McClelland,et al. Language models show human-like content effects on reasoning , 2022, ArXiv.
[4] Yuhuai Wu,et al. Exploring Length Generalization in Large Language Models , 2022, NeurIPS.
[5] Pedro A. Ortega,et al. Neural Networks and the Chomsky Hierarchy , 2022, ICLR.
[6] Eric Schulz,et al. Using cognitive psychology to understand GPT-3 , 2022, Proceedings of the National Academy of Sciences of the United States of America.
[7] Adrià Puigdomènech Badia,et al. The CLRS Algorithmic Reasoning Benchmark , 2022, ICML.
[8] Ian S. Fischer,et al. Multi-Game Decision Transformers , 2022, NeurIPS.
[9] Sergio Gomez Colmenarejo,et al. A Generalist Agent , 2022, Trans. Mach. Learn. Res..
[10] R. Thomas McCoy,et al. Neurocompositional computing: From the Central Paradox of Cognition to a new generation of AI systems , 2022, AI Mag..
[11] Omer Levy,et al. Transformer Language Models without Positional Encodings Still Learn Positional Information , 2022, EMNLP.
[12] Matt Gardner,et al. Impact of Pretraining Term Frequencies on Few-Shot Reasoning , 2022, ArXiv.
[13] Yuri Burda,et al. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , 2022, ArXiv.
[14] J. Schmidhuber,et al. The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization , 2021, ICLR.
[15] Noah A. Smith,et al. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.
[16] J. Schmidhuber,et al. The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers , 2021, EMNLP.
[17] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[18] J. Ainslie,et al. Making Transformers Solve Compositional Tasks , 2021, ACL.
[19] Sergey Levine,et al. Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.
[20] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.
[21] Jianlin Su,et al. RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.
[22] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[23] Lior Wolf,et al. Transformer Interpretability Beyond Attention Visualization , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[25] Omer Levy,et al. Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.
[26] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[27] Marco Baroni,et al. Syntactic Structure from Deep Learning , 2020, Annual Review of Linguistics.
[28] Manaal Faruqui,et al. Attention Interpretability Across NLP Tasks , 2019, ArXiv.
[29] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[30] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[31] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[32] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[33] Madhura R. Joglekar,et al. Task representations in neural networks trained to perform many cognitive tasks , 2019, Nature Neuroscience.
[34] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[37] J. Fodor,et al. Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.
[38] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.