Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform

Accelerating database applications using FPGAs has recently been an area of growing interest in both academia and industry. Equi-join is one of the key database operations whose performance highly depends on sorting, which exhibits high memory usage on FPGA. A fully pipelined N-key merge sorter consists of log N sorting stages using O(N) memory totally. For large data sets, external memory has to be employed to perform data buffering between the sorting stages. This introduces pipeline stalls as well as several iterations between FPGA and external memory, causing significant performance degradation. In this paper, we speed-up equi-join using a hybrid CPU-FPGA heterogeneous platform. To alleviate the performance impact of limited memory, we propose a merge sort based hybrid design where the first few sorting stages in the merge sort tree are replaced with "folded" bitonic sorting networks. These "folded" bitonic sorting networks operate in parallel on the FPGA. The partial results are then merged on the CPU to produce the final sorted result. Based on this hybrid sorting design, we develop two streaming join algorithms by optimizing the classic CPU-based nested-loop join and sort-merge join algorithms. On a rangeof data set sizes, our design achieves throughput improvement of 3.1x and 1.9x compared with software-only and FPGA only implementations, respectively. Our design sustains 21.6% of thepeak bandwidth, which is 3.9x utilization obtained by the state-of-the-art FPGA equi-join implementation.

[1]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..

[2]  Gustavo Alonso,et al.  Sorting networks on FPGAs , 2012, The VLDB Journal.

[3]  Viktor K. Prasanna,et al.  Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Jim Tørresen,et al.  FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting , 2011, FPGA '11.

[5]  Kunle Olukotun,et al.  Hardware acceleration of database operations , 2014, FPGA.

[6]  Jignesh M. Patel,et al.  Memory footprint matters: efficient equi-join algorithms for main memory data processing , 2013, SoCC.

[7]  Viktor K. Prasanna,et al.  Energy efficient parameterized FFT architecture , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[8]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[9]  Viktor K. Prasanna,et al.  Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA , 2015, FPGA.

[10]  Viktor K. Prasanna,et al.  Energy-efficient architecture for stride permutation on streaming data , 2013, 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig).

[11]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[12]  Gustavo Alonso,et al.  Streams on Wires - A Query Compiler for FPGAs , 2009, Proc. VLDB Endow..

[13]  Hari Angepat,et al.  An FPGA-based In-Line Accelerator for Memcached , 2014, IEEE Computer Architecture Letters.

[14]  Valery Sklyarov,et al.  Implementation in FPGA of Address-Based Data Sorting , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[15]  M. W. Blasgen,et al.  Storage and Access in Relational Data Bases , 1977, IBM Syst. J..

[16]  Bharat Sukhwani,et al.  Database analytics acceleration using FPGAs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).