Energy-efficient parameterized 2-D separable convolution on FPGA

2-D Convolution is widely used in image and video processing applications. While prior work has focused on the impact of various trade-offs with respect to area, arithmetic resources and throughput, the impact of these on energy efficiency is not well studied. In this work, we propose an energy-efficient parameterized convolution architecture for 2-D separable kernel. Two types of algorithm mapping parameters, namely buffering scheme and level of parallelism, are used to characterize the architecture. By comparing the energy efficiency of architectures with various buffering schemes and parallelism, we demonstrate that the design strategy for high energy efficiency is different from the design strategy for high memory and area efficiency. With the optimized buffering scheme and parallelism, the energy consumption of the on-chip memory and external memory is significantly reduced while achieving high throughput. A DRAM activation schedule is also proposed to reduce the energy consumption of external memory. We implement the energy optimized architecture on a state-of-the-art FPGA for various image sizes. Our design sustains up to 61.8% of the peak energy efficiency of the device. Compared with the state-of-the-art design, our design achieves up to 34.1% energy efficiency improvement.

[1]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Francisco Cardells-Tormo,et al.  Area-efficient 2-D shift-variant convolvers for FPGA-based digital image processing , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[3]  Greg Brown,et al.  A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors , 2013, TACO.

[4]  Greg Brown,et al.  A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications , 2012, FPGA '12.

[5]  Paolo Prinetto,et al.  An area-efficient 2-D convolution implementation on FPGA for space applications , 2011, 2011 IEEE 6th International Design and Test Workshop (IDT).

[6]  Jack Jean,et al.  Data Buffering and Allocation in Mapping Generalized Template Matching on Reconfigurable Systems , 2004, The Journal of Supercomputing.

[7]  Carlos H. Llanos,et al.  Kernel analysis for architecture design trade off in convolution-based image filtering , 2012, 2012 25th Symposium on Integrated Circuits and Systems Design (SBCCI).

[8]  Marco Lanuzza,et al.  A high-performance fully reconfigurable FPGA-based 2D convolution processor , 2005, Microprocess. Microsystems.

[9]  Wayne Luk,et al.  Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing , 2010, 2010 International Conference on Field-Programmable Technology.

[10]  César Torres-Huitzil,et al.  FPGA-Based Configurable Systolic Architecture for Window-Based Image Processing , 2005, EURASIP J. Adv. Signal Process..

[11]  Yvon Savaria,et al.  Reconfigurable pipelined 2-D convolvers for fast digital signal processing , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Mark Horowitz,et al.  Rethinking DRAM Power Modes for Energy Proportionality , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  SavariaYvon,et al.  Reconfigurable pipelined 2-D convolvers for fast digital signal processing , 1999 .

[14]  Michael A. Isnardi,et al.  ATSC Video and Audio Coding , 2006, Proceedings of the IEEE.

[15]  Hui Zhang,et al.  A Multiwindow Partial Buffering Scheme for FPGA-Based 2-D Convolvers , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.