Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations

Due to their high data-rate and simple control, streaming architectures have become popular for hardware implementation of data intensive applications. A key problem in designing such architectures is to permute streaming data. In this paper, we present a technique to realize arbitrary fixed permutation on streaming data. We develop a parameterized architecture which accepts data streams as input and generates the permuted data after a certain amount of delay. Our design accepts continuous input at a fixed rate of p per cycle, where p is the data parallelism of the architecture. To construct the streaming architecture for a given fixed permutation, we develop a mapping approach by configuring the classic Benes network to obtain the datapath and the control logic. We demonstrate a complete design automation tool which takes as input design parameters including the permutation pattern and the data parallelism p, and produces register-transfer level Verilog description of the design. We evaluate the generated designs on Xilinx Virtex-7 FPGA using post place-and-route results.

[1]  James C. Hoe,et al.  Permuting streaming data using RAMs , 2009, JACM.

[2]  V. Benes Optimal rearrangeable multistage connecting networks , 1964 .

[3]  Tuomas Järvinen Systematic Methods for Designing Stride Permutation Interconnections , 2004 .

[4]  James C. Hoe,et al.  Automatic generation of streaming datapaths for arbitrary fixed permutations , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[5]  Viktor K. Prasanna,et al.  Energy efficient parameterized FFT architecture , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[6]  Viktor K. Prasanna,et al.  Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA , 2015, FPGA.

[7]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[8]  Sartaj Sahni,et al.  An optimal routing algorithm for mesh-connected Parallel computers , 1980, JACM.

[9]  Bevan M. Baas,et al.  A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.

[10]  Viktor K. Prasanna,et al.  Energy-efficient architecture for stride permutation on streaming data , 2013, 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig).