C-MemMAP: clustering-driven compact, adaptable, and generalizable meta-LSTM models for memory access prediction

With the rise of Big Data, there has been a significant effort in increasing compute power through GPUs, TPUs, and heterogeneous architectures. As a result, many applications are memory bound, i.e., they are bottlenecked by the movement of data from main memory to compute units. One way to address this issue is through data prefetching, which relies on accurate prediction of memory accesses. While recent deep learning models have performed well on sequence prediction problems, they are far too heavy in terms of model size and inference latency to be practical for data prefetching. Here, we propose clustering-driven compact LSTM models that can predict the next memory access with high accuracy. We introduce a novel clustering approach called Delegated model that can reliably cluster the applications. For each cluster, we train a compact meta-LSTM model that can quickly adapt to any application in the cluster. Prior LSTM-based work on access prediction has used orders of magnitude more parameters and developed one model for each application (trace). While one (specialized) model per application can result in more accuracy, it is not a scalable approach. In contrast, our models can predict for a class of applications by trading off specialization at the cost of few retraining steps at runtime, for a more generalizable compact meta-model. Our experiments on 13 benchmark applications demonstrate that clustering-driven ensemble compact meta-models can obtain accuracy close to specialized models using few batches of retraining for majority of the applications.

[1]  Zhi-Li Zhang,et al.  DeepCache: A Deep Learning Based Framework For Content Caching , 2018, NetAI@SIGCOMM.

[2]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[3]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI.

[4]  Christoforos E. Kozyrakis,et al.  Learning Memory Access Patterns , 2018, ICML.

[5]  Viktor K. Prasanna,et al.  Predicting memory accesses: the road to compact ML-driven prefetcher , 2019, MEMSYS.

[6]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[7]  Seth H. Pugsley,et al.  Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[9]  Yuan Zeng,et al.  Long short term memory based hardware prefetcher: a case study , 2017, MEMSYS.

[10]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[11]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  RAOP: Recurrent Neural Network Augmented Offset Prefetcher , 2020, MEMSYS.

[15]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Apan Qasem,et al.  Maximizing Hardware Prefetch Effectiveness with Machine Learning , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[17]  Marco Cuturi,et al.  Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.

[18]  Leeor Peled,et al.  A Neural Network Prefetcher for Arbitrary Memory Access Patterns , 2019, ACM Trans. Archit. Code Optim..

[19]  Viktor K. Prasanna,et al.  MemMAP: Compact and Generalizable Meta-LSTM Models for Memory Access Prediction , 2020, PAKDD.

[20]  Donald Nguyen,et al.  Machine learning-based prefetch optimization for data center applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[21]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .