Fast Online Set Intersection for Network Processing on FPGA

Online set intersection operations have been widely used in network processing tasks, such as Quality of Service differentiation, firewall processing, and packet/traffic classification. The major challenge for online set intersection is to sustain line-rate processing speed; accelerating set intersection using state-of-the-art hardware devices is of great interest to the research community. In this paper, we present a novel high-performance set intersection approach on FPGA. In our approach, each element in any set is represented by a combination of Group ID (GID) and Bit Stride (BS); all the sets are intersected using linear merge techniques and bitwise AND operations. We map our online set intersection algorithm onto hardware; this is done by constructing modular Processing Element (PE) and concatenating multiple PEs into a tree-based parallel architecture. In order to improve the throughput on a state-of-the-art FPGA, we feed all the inputs to FPGA in a streaming fashion with the help of the synchronization GIDs. Post place-and-route results show that, for a typical set intersection problem in network processing, our design can intersect <inline-formula><tex-math notation="LaTeX">$\text{eight}$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="qu-ieq1-2537818.gif"/></alternatives></inline-formula> sets, each of up to <inline-formula> <tex-math notation="LaTeX">$32$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="qu-ieq2-2537818.gif"/> </alternatives></inline-formula>K elements, at a throughput of <inline-formula><tex-math notation="LaTeX"> $47.4$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="qu-ieq3-2537818.gif"/></alternatives> </inline-formula> Thousand Intersections Per Second (KIPS) and a latency of <inline-formula><tex-math notation="LaTeX"> $94.8\,\mu$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="qu-ieq4-2537818.gif"/></alternatives> </inline-formula>s per batch of inputs. Compared to the classic linear merge or bitwise AND techniques on state-of-the-art multi-core processors, our designs on FPGA achieves up to <inline-formula><tex-math notation="LaTeX"> $66\times$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="qu-ieq5-2537818.gif"/></alternatives> </inline-formula> throughput improvement and <inline-formula><tex-math notation="LaTeX">$80\times$ </tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="qu-ieq6-2537818.gif"/></alternatives></inline-formula> latency reduction.

[1]  Zai-lan Li,et al.  MIT-LCS-TM-637 Scalable Packet Classification Using Bit Vector Aggregating and Folding , 2002 .

[2]  David E. Taylor Survey and taxonomy of packet classification techniques , 2005, CSUR.

[3]  Bolin Ding,et al.  Fast Set Intersection in Memory , 2011, Proc. VLDB Endow..

[4]  Maya Gokhale,et al.  Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA? , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[5]  Viktor K. Prasanna,et al.  A Decomposition-Based Approach for Scalable Many-Field Packet Classification on Multi-core Processors , 2014, International Journal of Parallel Programming.

[6]  Patrick Crowley,et al.  HEXA: Compact Data Structures for Faster Packet Processing , 2007, 2007 IEEE International Conference on Network Protocols.

[7]  Viktor K. Prasanna,et al.  A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[8]  Saman Taghavi Zargar,et al.  A Survey of Defense Mechanisms Against Distributed Denial of Service (DDoS) Flooding Attacks , 2013, IEEE Communications Surveys & Tutorials.

[9]  George Varghese,et al.  Scalable packet classification , 2005, IEEE/ACM Transactions on Networking.

[10]  Xiang Wang,et al.  ParaSplit: A Scalable Architecture on FPGA for Terabit Packet Classification , 2012, 2012 IEEE 20th Annual Symposium on High-Performance Interconnects.

[11]  John McAllister,et al.  Guest Editorial: Special Issue on Embedded Computer Systems: Architectures, Modeling and Simulation , 2014, International Journal of Parallel Programming.

[12]  Barbara M. Chapman,et al.  OpenMP , 2005, Parallel Comput..

[13]  Haoyu Song,et al.  Design and evaluation of packet classification systems , 2006 .

[14]  Nick McKeown,et al.  Algorithms for packet classification , 2001, IEEE Netw..

[15]  T. V. Lakshman,et al.  Efficient multimatch packet classification and lookup with TCAM , 2005, IEEE Micro.

[16]  Sudipto Guha,et al.  Improving the Performance of List Intersection , 2009, Proc. VLDB Endow..

[17]  Yuanyuan Yang,et al.  Joint Optimal Data Rate and Power Allocation in Lossy Mobile Ad Hoc Networks with Delay-Constrained Traffics , 2015, IEEE Transactions on Computers.

[18]  Viktor K. Prasanna,et al.  Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU , 2015, 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[19]  Tom Feist,et al.  Vivado Design Suite , 2012 .

[20]  Wu-chun Feng,et al.  On the performance and energy efficiency of FPGAs and GPUs for polyphase channelization , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).