High throughput energy efficient parallel FFT architecture on FPGAs

Throughput is a key performance metric for streaming FFT architectures. However, increasing spatial parallelism to improve throughput introduces complex routing, thus resulting in high power consumption. In this paper, we propose a high throughput energy efficient parallel FFT architecture based on Cooley-Tukey algorithm. Multiple pipeline FFT processors using time-multiplexing are utilized to perform FFT computation tasks in parallel. This design realizes high performance using task-level parallelism and avoids complex routing. Furthermore, to reduce the memory power consumption, a periodic memory activation (PMA) scheme is developed. By analyzing energy efficiency (defined as GOPS/Joule) asymptotically, we show that our design achieves a low energy efficiency complexity while satisfying a high-throughput requirement. For N-point FFT (64 ≤ N ≤ 4096), our proposed architecture achieves 50 ~ 63 GOPS/Joule, i.e., up to 78% of the Peak Energy Efficiency of FFT designs on FPGAs. Compared with a state-of-the-art design, our design improves the energy efficiency (defined as GOPS/Joule) by 17% to 26% with the same throughput.

[1]  C. K. Yuen,et al.  Theory and Application of Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  James C. Hoe,et al.  Automatic generation of customized discrete Fourier transform IPs , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[3]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[4]  James C. Hoe,et al.  Permuting streaming data using RAMs , 2009, JACM.

[5]  Viktor K. Prasanna,et al.  Energy-efficient signal processing using FPGAs , 2003, FPGA '03.

[6]  Bernard Chazelle,et al.  Census functions: An approach to VLSI upper bounds , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[7]  Yu-Wei Lin,et al.  A 1-GS/s FFT/IFFT processor for UWB applications , 2005, IEEE Journal of Solid-State Circuits.

[8]  Mats Torkelson,et al.  A new approach to pipeline FFT processor , 1996, Proceedings of International Conference on Parallel Processing.

[9]  Suhwan Kim,et al.  Power-complexity analysis of pipelined VLSI FFT architectures for low energy wireless communication applications , 1999, 42nd Midwest Symposium on Circuits and Systems (Cat. No.99CH36356).

[10]  Bevan M. Baas,et al.  A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.

[11]  Alvin M. Despain,et al.  Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementations , 1984, IEEE Transactions on Computers.

[12]  Steven J. E. Wilton,et al.  A detailed power model for field-programmable gate arrays , 2005, TODE.

[13]  E. V. Jones,et al.  A pipelined FFT processor for word-sequential data , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Chein-Wei Jen,et al.  High-speed and low-power split-radix FFT , 2003, IEEE Trans. Signal Process..