Algorithmic optimizations for energy efficient throughput-oriented FFT architectures on FPGA

Energy efficiency is a key design metric when implementing signal processing applications on FPGAs. In this paper, high level energy optimizations are proposed to facilitate the development of an energy efficient throughput-oriented FFT design. At the algorithm mapping level, we develop a data remapping technique and a memory activation scheduling method to reduce memory energy consumption. At the architecture binding level, we explore and identify the optimal memory binding scheme and pipelining strategy of the floating point units. The experimental results show that the dynamic power dissipation is reduced significantly using the proposed algorithmic optimizations. Compared with the baseline architecture, the optimized architecture achieves 2.97x, 2.99x and 3.05x improvement in energy-efficiency (defined as GFLOPS/W) for 256, 4096 and 32768 point FFTs, respectively. Compared with the state-of-the-art designs, our implementation realizes up to 7.39x energy efficiency improvement while sustaining almost 20x throughput (defined as MPoints/s) performance.

[1]  Keshab K. Parhi,et al.  High-Throughput VLSI Architecture for FFT Computation , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[2]  Viktor K. Prasanna,et al.  Energy-efficient and parameterized designs for fast Fourier transform on FPGAs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  James C. Hoe,et al.  Automatic generation of customized discrete Fourier transform IPs , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[4]  Prithviraj Banerjee,et al.  Overview of the FREEDOM compiler for mapping DSP software to FPGAs , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[5]  Mats Torkelson,et al.  A new approach to pipeline FFT processor , 1996, Proceedings of International Conference on Parallel Processing.

[6]  Song-Nien Tang,et al.  A 2.4-GS/s FFT Processor for OFDM-Based WPAN Applications , 2010, IEEE Transactions on Circuits and Systems II: Express Briefs.

[7]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[8]  J. F. Sevillano,et al.  Radix $r^{k} $ FFTs: Matricial Representation and SDC/SDF Pipeline Implementation , 2009, IEEE Transactions on Signal Processing.

[9]  Keshab K. Parhi,et al.  A Pipelined FFT Architecture for Real-Valued Signals , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[10]  Viktor K. Prasanna,et al.  A model-based extensible framework for efficient application design using FPGA , 2007, TODE.

[11]  C. K. Yuen,et al.  Theory and Application of Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  E. V. Jones,et al.  A pipelined FFT processor for word-sequential data , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  Hanho Lee,et al.  A high-speed low-complexity modified radix-25 FFT processor for gigabit WPAN applications , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[14]  Liang Liu,et al.  Design of Low-Power, 1GS/s Throughput FFT Processor for MIMO-OFDM UWB Communication System , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[15]  Swarup Bhunia,et al.  Energy-Efficient Application Mapping in FPGA through Computation in Embedded Memory Blocks , 2012, 2012 25th International Conference on VLSI Design.