Hyperbolic Attention Networks

We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact.

[1]  G. Borg Psychophysical bases of perceived exertion. , 1982, Medicine and science in sports and exercise.

[2]  N. Linial,et al.  Low distortion euclidean embeddings of trees , 1998 .

[3]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[4]  H. Ritter Self-Organizing Maps on non-euclidean Spaces , 1999 .

[5]  Helge J. Ritter,et al.  Hyperbolic Self-Organizing Maps for Semantic Navigation , 2001, NIPS.

[6]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[7]  A. Ungar,et al.  Analytic Hyperbolic Geometry: Mathematical Foundations And Applications , 2005 .

[8]  B. Enquist,et al.  Rebuilding community ecology from functional traits. , 2006, Trends in ecology & evolution.

[9]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[10]  Amin Vahdat,et al.  On curvature and temperature of complex networks , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Abraham Albert Ungar,et al.  A Gyrovector Space Approach to Hyperbolic Geometry , 2009, A Gyrovector Space Approach to Hyperbolic Geometry.

[12]  Dmitri V. Krioukov,et al.  Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces , 2008, 2010 Proceedings IEEE INFOCOM.

[13]  Amin Vahdat,et al.  Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Rik Sarkar,et al.  Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane , 2011, GD.

[15]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[16]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[17]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[18]  Mario Fritz,et al.  A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.

[19]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Henning Meyerhenke,et al.  Generating Random Hyperbolic Graphs in Subquadratic Time , 2015, ISAAC.

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[24]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[25]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[26]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[27]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[28]  Michael S. Bernstein,et al.  Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[30]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[32]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[35]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[36]  Max Tegmark,et al.  Critical Behavior in Physics and Probabilistic Formal Languages , 2016, Entropy.

[37]  Marc Peter Deisenroth,et al.  Neural Embeddings of Graphs in Hyperbolic Space , 2017, ArXiv.

[38]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[39]  Kumiko Tanaka-Ishii,et al.  Do neural nets learn statistical laws behind natural language? , 2017, PloS one.

[40]  Max Welling,et al.  Attention Solves Your TSP , 2018, ArXiv.

[41]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[42]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Siu Cheung Hui,et al.  Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering , 2017, WSDM.

[44]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[45]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.