3M-AI: A Multi-task and Multi-core Virtualization Framework for Multi-FPGA AI Systems in the Cloud

With the ever-growing demands for online Artificial Intelligence (AI), the hardware virtualization support for deep learning accelerators is vital for providing AI capability in the cloud. Three basic features, multi-task, dynamic workload, and remote access, are fundamental for hardware virtualization. However, most of the deep learning accelerators do not support concurrent execution of multiple tasks. Besides, the SOTA multi-DNN scheduling algorithm for NN accelerators neither consider the multi-task concurrent execution and resources allocation for the multi-core DNN accelerators. Moreover, existing GPU virtualized solutions could introduce a huge remote access latency overhead, resulting in a severe system performance drop. In order to tackle these challenges, we propose 3M-AI, a Multi-task and Multi-core virtualization framework for Multi-FPGA AI systems in the cloud. 3M-AI enables model parallelism on multi-FPGA by optimizing data synchronization and movement between FPGAs. 3M-AI exploits heuristic hardware resource allocation algorithm and accurate multi-core latency prediction model. 3M-AI significantly reduces the remote API access overhead to nearly 1%, and achieves better NN inference latency with a batch size 1 compared with GPU virtualization solutions.