Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

As modern neural networks have grown to billions of parameters, meeting tight latency budgets has become increasingly challenging. Approaches like compression, sparsification and network pruning have proven effective to tackle this problem - but they rely on modifications of the underlying network. In this paper, we look at a complimentary approach of optimizing how tensors are mapped to on-chip memory in an inference accelerator while leaving the network parameters untouched. Since different memory components trade off capacity for bandwidth differently, a sub-optimal mapping can result in high latency. We introduce evolutionary graph reinforcement learning (EGRL) - a method combining graph neural networks, reinforcement learning (RL) and evolutionary search - that aims to find the optimal mapping to minimize latency. Furthermore, a set of fast, stateless policies guide the evolutionary search to improve sample-efficiency. We train and validate our approach directly on the Intel NNP-I chip for inference using a batch size of 1. EGRL outperforms policy-gradient, evolutionary search and dynamic programming baselines on BERT, ResNet-101 and ResNet-50. We achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.

[1]  S. Niwattanakul,et al.  Using of Jaccard Coefficient for Keywords Similarity , 2022 .

[2]  Jimmy J. Lin,et al.  Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.

[3]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[4]  Calvin Lin,et al.  Applying Deep Learning to the Cache Replacement Problem , 2019, MICRO.

[5]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[6]  Thierry Moreau,et al.  Learning to Optimize Tensor Programs , 2018, NeurIPS.

[7]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[8]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[9]  Richeeka Bathija,et al.  Guided Interactive Learning through Chatbot using Bi-directional Encoder Representations from Transformers (BERT) , 2020, 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA).

[10]  Dimitris Bertsimas,et al.  A Robust Optimization Approach to Supply Chain Management , 2004, IPCO.

[11]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[12]  Qi Yu,et al.  DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[14]  Kavosh Asadi,et al.  An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.

[15]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Song Han,et al.  ADC: Automated Deep Compression and Acceleration with Reinforcement Learning , 2018, ArXiv.

[17]  Michael Behar,et al.  Spring Hill (NNP-I 1000) Intel’s Data Center Inference Chip , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[18]  Thomas Bäck,et al.  An Overview of Evolutionary Computation , 1993, ECML.

[19]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[20]  Martin D. Schatz,et al.  Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.

[21]  Jinwon Lee,et al.  Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices , 2020, MLSys.

[22]  Yadong Mu,et al.  Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues , 2017, ArXiv.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[25]  Sebastian Risi,et al.  Continual and One-Shot Learning Through Neural Networks with Dynamic External Memory , 2017, EvoApplications.

[26]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[27]  Cody Coleman,et al.  MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[28]  Kagan Tumer,et al.  Collaborative Evolutionary Reinforcement Learning , 2019, ICML.

[29]  Jimmy J. Lin,et al.  DocBERT: BERT for Document Classification , 2019, ArXiv.

[30]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Carole-Jean Wu,et al.  MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance , 2020, IEEE Micro.

[32]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[33]  Hongzi Mao,et al.  Placeto: Efficient Progressive Device Placement Optimization , 2018 .

[34]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[35]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[36]  Ymir Vigfusson,et al.  Optimal Data Placement for Heterogeneous Cache, Memory, and Storage Systems , 2020, Proc. ACM Meas. Anal. Comput. Syst..

[37]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[38]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[39]  Eriko Nurvitadhi,et al.  Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[41]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[42]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Ambuj K. Singh,et al.  Learning Heuristics over Large Graphs via Deep Reinforcement Learning , 2019, ArXiv.

[44]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[45]  Sanjay V. Rajopadhye,et al.  Unbounded knapsack problem: Dynamic programming revisited , 2000, Eur. J. Oper. Res..

[46]  Shuiwang Ji,et al.  Graph U-Nets , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[48]  Quoc V. Le,et al.  Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.

[49]  Kagan Tumer,et al.  Evolution-Guided Policy Gradient in Reinforcement Learning , 2018, NeurIPS.

[50]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[51]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[52]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[53]  Quoc V. Le,et al.  A Hierarchical Model for Device Placement , 2018, ICLR.

[54]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[55]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[56]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[57]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[58]  David B. Fogel,et al.  Evolutionary Computation: Towards a New Philosophy of Machine Intelligence , 1995 .

[59]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.