论文信息 - Energy-efficient architecture for stride permutation on streaming data

Energy-efficient architecture for stride permutation on streaming data

Stride permutation is widely used in various digital signal processing algorithms when implemented on FPGAs. Permuting a long data sequence through hardware wiring leads to high area consumption and routing complexity. A preferable approach is to build a hardware structure to permute streaming data inputs. In this paper, we present an energy-efficient architecture to perform stride permutation on streaming data. The supported problem size and stride are powers of two. A three-stage structure, composed of two stages of interconnection networks and one stage of data buffers, is used as a baseline architecture. To improve the energy efficiency, we develop a data remapping technique which reduces the required memory by 50% at the expense of small amount of extra logic. We also present a multiplexer-based cyclic shift interconnection network. Our proposed architecture is evaluated using two performance metrics: composite Energy ×Area × Time (EAT) and energy efficiency (defined as points/Joule). The experimental results show that the proposed data remapping technique reduces up to 40% dynamic power consumption compared with the baseline architecture. The proposed architecture results in a high energy efficiency of up to 75.3 giga points/Joule, and has an EAT ratio of 0.31 to 0.35 over the baseline architecture for various streaming width w (2 ≤ w ≤ 32).

Viktor K. Prasanna | Ren Chen

[1] James C. Hoe,et al. Permuting streaming data using RAMs , 2009, JACM.

[2] Mats Torkelson,et al. A new approach to pipeline FFT processor , 1996, Proceedings of International Conference on Parallel Processing.

[3] Javier D. Bruguera,et al. High-performance VLSI architecture for the Viterbi algorithm , 1997, IEEE Trans. Commun..

[4] Viktor K. Prasanna,et al. High throughput energy efficient parallel FFT architecture on FPGAs , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[5] Jarmo Takala,et al. Stride permutation networks for array processors , 2004 .

[6] E. V. Jones,et al. A pipelined FFT processor for word-sequential data , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7] David Nassimi. A self routing Benes network , 1980, ISCA '80.

[8] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .

[9] Michael Conner,et al. Recursive fast algorithm and the role of the tensor product , 1992, IEEE Trans. Signal Process..

[10] Viktor K. Prasanna,et al. Energy efficient parameterized FFT architecture , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[11] Jarmo Takala,et al. Stride permutation networks for array processors , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[12] Charles Clos,et al. A study of non-blocking switching networks , 1953 .

[13] Viktor K. Prasanna,et al. Optimal Multipass Self-Routing Algorithms for Clos-Type Multistage Networks , 1992, ICPP.