The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
暂无分享,去创建一个
[1] J. Fodor,et al. Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.
[2] J. Fodor,et al. Connectionism and the problem of systematicity: Why Smolensky's solution doesn't work , 1990, Cognition.
[3] Stephen José Hanson,et al. A stochastic version of the delta rule , 1990 .
[4] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[7] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[8] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.
[9] Alex Graves,et al. Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.
[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[14] Marco Baroni,et al. Memorize or generalize? Searching for a compositional RNN in a haystack , 2018, ArXiv.
[15] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.
[16] Samuel R. Bowman,et al. ListOps: A Diagnostic Dataset for Latent Tree Learning , 2018, NAACL.
[17] Anand Singh,et al. Learning compositionally through attentive guidance , 2018, ArXiv.
[18] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[19] Marco Baroni,et al. CNNs found to jump around more skillfully than RNNs: Compositional Generalization in Seq2seq Convolutional Networks , 2019, ACL.
[20] Liang Zhao,et al. Compositional Generalization for Primitive Substitutions , 2019, EMNLP.
[21] Aaron C. Courville,et al. Ordered Memory , 2019, NeurIPS.
[22] Hermann Ney,et al. Language Modeling with Deep Transformers , 2019, INTERSPEECH.
[23] Brenden M. Lake,et al. Compositional generalization through meta sequence-to-sequence learning , 2019, NeurIPS.
[24] Yoshua Bengio,et al. CLOSURE: Assessing Systematic Generalization of CLEVR Models , 2019, ViGIL@NeurIPS.
[25] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[26] Armand Joulin,et al. Cooperative Learning of Disjoint Syntax and Semantics , 2019, NAACL.
[27] Elia Bruni,et al. Transcoding Compositionally: Using Attention to Find More Generalizable Solutions , 2019, BlackboxNLP@ACL.
[28] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[29] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[30] Jürgen Schmidhuber,et al. Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control , 2019, ICLR.
[31] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[32] Yoshua Bengio,et al. Compositional generalization in a deep seq2seq model by separating syntax and semantics , 2019, ArXiv.
[33] Marc van Zee,et al. Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures , 2020, ArXiv.
[34] Qian Liu,et al. Compositional Generalization by Learning Analytical Expressions , 2020, NeurIPS.
[35] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.
[36] Xiao Wang,et al. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.
[37] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[38] Mathijs Mul,et al. Compositionality Decomposed: How do Neural Networks Generalise? , 2019, J. Artif. Intell. Res..
[39] Kazuki Irie,et al. Advancing neural language modeling in automatic speech recognition , 2020 .
[40] Chen Liang,et al. Compositional Generalization via Neural-Symbolic Stack Machines , 2020, NeurIPS.
[41] Noam Shazeer,et al. GLU Variants Improve Transformer , 2020, ArXiv.
[42] Klaus Greff,et al. On the Binding Problem in Artificial Neural Networks , 2020, ArXiv.
[43] Elia Bruni,et al. Location Attention for Extrapolation to Longer Sequences , 2019, ACL.
[44] Eran Yahav,et al. Thinking Like Transformers , 2021, ICML.
[45] E. Kharitonov,et al. Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN , 2021, BLACKBOXNLP.
[46] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[47] Charles Blundell,et al. PonderNet: Learning to Ponder , 2021, ArXiv.
[48] J. Ainslie,et al. Making Transformers Solve Compositional Tasks , 2021, ACL.
[49] Glenn M. Fung,et al. Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.
[50] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.
[51] Kazuki Irie,et al. Going Beyond Linear Transformers with Recurrent Fast Weight Programmers , 2021, NeurIPS.
[52] Richard L. Lewis,et al. Reinforcement Learning of Implicit and Explicit Control Flow in Instructions , 2021, ICML.
[53] Ming-Wei Chang,et al. Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both? , 2020, ACL.
[54] Cornelia Caragea,et al. Modeling Hierarchical Structures with Continuous Recursive Neural Networks , 2021, ICML.
[55] J. Schmidhuber,et al. The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers , 2021, EMNLP.