Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

As modern neural networks have grown to billions of parameters, meeting tight latency budgets has become increasingly challenging. Approaches like compression, sparsification and network pruning have proven effective to tackle this problem - but they rely on modifications of the underlying network. In this paper, we look at a complimentary approach of optimizing how tensors are mapped to on-chip memory in an inference accelerator while leaving the network parameters untouched. Since different memory components trade off capacity for bandwidth differently, a sub-optimal mapping can result in high latency. We introduce evolutionary graph reinforcement learning (EGRL) - a method combining graph neural networks, reinforcement learning (RL) and evolutionary search - that aims to find the optimal mapping to minimize latency. Furthermore, a set of fast, stateless policies guide the evolutionary search to improve sample-efficiency. We train and validate our approach directly on the Intel NNP-I chip for inference using a batch size of 1. EGRL outperforms policy-gradient, evolutionary search and dynamic programming baselines on BERT, ResNet-101 and ResNet-50. We achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.

[1]  Thierry Moreau,et al.  Learning to Optimize Tensor Programs , 2018, NeurIPS.

[2]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[3]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[4]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[5]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[6]  Sebastian Risi,et al.  Continual and One-Shot Learning Through Neural Networks with Dynamic External Memory , 2017, EvoApplications.

[7]  Martin D. Schatz,et al.  Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.

[8]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[9]  David B. Fogel,et al.  Evolutionary Computation: Towards a New Philosophy of Machine Intelligence , 1995 .

[10]  Sanjay V. Rajopadhye,et al.  Unbounded knapsack problem: Dynamic programming revisited , 2000, Eur. J. Oper. Res..

[11]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ambuj K. Singh,et al.  Learning Heuristics over Large Graphs via Deep Reinforcement Learning , 2019, ArXiv.

[13]  Calvin Lin,et al.  Applying Deep Learning to the Cache Replacement Problem , 2019, MICRO.

[14]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[16]  Kagan Tumer,et al.  Evolution-Guided Policy Gradient in Reinforcement Learning , 2018, NeurIPS.

[17]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[18]  Jimmy J. Lin,et al.  Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.

[19]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[20]  Thomas Bäck,et al.  An Overview of Evolutionary Computation , 1993, ECML.

[21]  Yadong Mu,et al.  Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues , 2017, ArXiv.

[22]  Kagan Tumer,et al.  Collaborative Evolutionary Reinforcement Learning , 2019, ICML.

[23]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[24]  Quoc V. Le,et al.  Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.

[25]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[26]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[27]  Carole-Jean Wu,et al.  MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance , 2020, IEEE Micro.

[28]  Ymir Vigfusson,et al.  Optimal Data Placement for Heterogeneous Cache, Memory, and Storage Systems , 2020, Proc. ACM Meas. Anal. Comput. Syst..

[29]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[30]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[31]  Song Han,et al.  ADC: Automated Deep Compression and Acceleration with Reinforcement Learning , 2018, ArXiv.

[32]  Carole-Jean Wu,et al.  MLPerf Inference Benchmark , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[33]  Quoc V. Le,et al.  A Hierarchical Model for Device Placement , 2018, ICLR.

[34]  D. Scott Cyphers,et al.  Intel® nGraphTM , 2018 .

[35]  S. Niwattanakul,et al.  Using of Jaccard Coefficient for Keywords Similarity , 2022 .

[36]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[37]  Jimmy J. Lin,et al.  DocBERT: BERT for Document Classification , 2019, ArXiv.

[38]  Dimitris Bertsimas,et al.  A Robust Optimization Approach to Supply Chain Management , 2004, IPCO.

[39]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[40]  Kavosh Asadi,et al.  An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.

[41]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[42]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[43]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[44]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[45]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[46]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Eriko Nurvitadhi,et al.  Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Qi Yu,et al.  DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[49]  Richeeka Bathija,et al.  Guided Interactive Learning through Chatbot using Bi-directional Encoder Representations from Transformers (BERT) , 2020, 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA).

[50]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[51]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[52]  Michael Behar,et al.  Spring Hill (NNP-I 1000) Intel’s Data Center Inference Chip , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[53]  Hongzi Mao,et al.  Placeto: Efficient Progressive Device Placement Optimization , 2018 .

[54]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[55]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[56]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[57]  Shuiwang Ji,et al.  Graph U-Nets , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[59]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).