Performance Modeling and FPGA Acceleration of Homomorphic Encrypted Convolution

Privacy of data is a critical concern when applying Machine Learning (ML) techniques to domains with sensitive data. Homomorphic Encryption (HE), by enabling computations on encrypted data, has emerged as a promising approach to perform inference on ML models such as Convolution Neural Network (CNN) in a privacy preserving manner. A significant portion of the total inference latency is in performing convolution over homomorphic encrypted data (HE-Convolution). For performing convolution over plaintext data, low latency accelerator designs have been proposed using algorithms such as im2col, frequency domain convolution, etc. However, developing accelerators for the HE versions of these algorithms is non-trivial. In this work, we develop a unified FPGA design that enables low latency execution of both im2col and frequency domain HE-Convolution. To enable selection of the efficient algorithm for each convolution layer of a CNN, we develop a performance model that takes the parameters about the encryption and convolution layer as input and outputs the computation and resource requirements of the two algorithms for that layer. We use the performance model to select convolution algorithm for each layer of ResNet-50 and obtain the first low latency batch-1 inference accelerator for CNN inference with HE-Convolution targeting FPGAs using HLS. We compare our design against prior techniques on CPUs and show that our accelerator achieves speedups in the range of $3.4\times\sim 6.7\times$ in latency.