Optimal Dynamic Data Layouts for 2D FFT on 3D Memory Integrated FPGA

FPGAs have been widely used for accelerating various applications. For many data intensive applications, the memory bandwidth can limit the performance. 3D memories with through-silicon-via connections provide potential solutions to the latency and bandwidth issues. In this paper, we revisit the classic 2D FFT problem to evaluate the performance of 3D memory integrated FPGA. To fully utilize the fine grained parallelism in 3D memory, optimal data layouts so as to effectively utilize the peak bandwidth of the device are needed. Thus, we propose dynamic data layouts specifically for optimizing the performance of the 3D architecture. In 2D FFT, data is accessed in row major order in the first phase whereas, the data is accessed in column major order in the second phase. This column major order results in high memory latency and low bandwidth due to high row activation overhead of memory. Therefore, we develop dynamic data layouts to improve memory access performance in the second phase. With parallelism employed in the third dimension of the memory, data parallelism can be increased to further improve the performance. We adopt a model based approach for 3D memory and we perform experiments on the FPGA to validate our analysis and evaluate the performance. Our experimental results demonstrate upi?źto 40x peak memory bandwidth utilization for column-wise FFT, thus resulting in approximately 97i?ź% improvement in throughput for the complete 2D FFT application, compared to the baseline architecture.

[1]  Peter Pirsch,et al.  Using SDRAMs for two-dimensional accesses of long 2n × 2m-point FFTs and transposing , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[2]  Viktor K. Prasanna,et al.  Energy efficient parameterized FFT architecture , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[3]  Viktor K. Prasanna,et al.  Performance Modeling of Matrix Multiplication on 3D Memory Integrated FPGA , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[4]  Viktor K. Prasanna,et al.  High throughput energy efficient parallel FFT architecture on FPGAs , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[5]  Viktor K. Prasanna,et al.  Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA , 2015, FPGA.

[6]  Narayanan Vijaykrishnan,et al.  FPGA Architecture for 2D Discrete Fourier Transform Based on 2D Decomposition for Large-sized Data , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[7]  Franz Franchetti,et al.  A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).

[8]  Viktor K. Prasanna,et al.  DRAM Row Activation Energy Optimization for Stride Memory Access on FPGA-Based Systems , 2015, ARC.

[9]  Ali Akoglu,et al.  A power efficient reconfigurable system-in-stack: 3D integration of accelerators, FPGAs, and DRAM , 2014, 2014 27th IEEE International System-on-Chip Conference (SOCC).

[10]  Chunming Zhang,et al.  Accelerating 2D FFT with Non-Power-of-Two Problem Size on FPGA , 2010, 2010 International Conference on Reconfigurable Computing and FPGAs.

[11]  Viktor K. Prasanna,et al.  Dynamic data layouts for cache-conscious implementation of a class of signal transforms , 2004, IEEE Transactions on Signal Processing.

[12]  Hong Ren Wu,et al.  The structure of vector radix fast Fourier transforms , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  Franz Franchetti,et al.  Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[14]  Shreyas G. Singapura,et al.  Towards Performance Modeling of 3D Memory Integrated FPGA Architectures , 2015, ARC.

[15]  Viktor K. Prasanna,et al.  Energy-efficient architecture for stride permutation on streaming data , 2013, 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig).