Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU

Due to the rapid growth of Internet, there is an increasing need for efficiently classifying packets with many header fields in large rule sets. For example, in Software Defined Networking (SDN), the OpenFlow table lookup can require 15 packet header fields to be examined. In this paper, we present several decomposition-based packet classification implementations with efficient optimization techniques. In the searching phase, packet headers are split or combined. In the merging phase, the partial searching results from all the fields are merged to generate the final result. We prototype our implementations on state-of-the-art Field Programmable Gate Array (FPGA), multi-core General Purpose Processor (GPP), and Graphics Processing Unit (GPU). On FPGA, we propose two optimization techniques to divide generic ranges; modular processing elements are constructed and concatenated into a systolic array. On multi-core GPP, we parallelize both the searching and merging phases using parallel program threads. On the GPU-accelerated platform, we minimize branch divergence and reduce the data communication overhead. Experimental results show that 500Million Packets Per Second (MPPS) throughput and 3μs latency can be achieved for 1:5K rule sets on FPGA. We achieve 14:7MPPS throughput and 30:5MPPS throughput for 32K rule sets on multi-core GPP and GPU-accelerated platforms, respectively. As a heterogeneous solution, our GPU-accelerated packet classier shows 2x speedup compared to the implementation using multi-core GPP only. Compared with prior works, our designs can match long packet headers against very complex rule sets.

[1]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[2]  Nian-Feng Tzeng,et al.  HaRP: Rapid Packet Classification via Hashing Round-Down Prefixes , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  Haoyu Song,et al.  Fast packet classification using bloom filters , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[4]  David E. Taylor Survey and taxonomy of packet classification techniques , 2005, CSUR.

[5]  Haim Kaplan,et al.  On finding an optimal TCAM encoding scheme for packet classification , 2013, 2013 Proceedings IEEE INFOCOM.

[6]  Shan Lu,et al.  Leveraging parallelism for multi-dimensional packetclassification on software routers , 2010, SIGMETRICS '10.

[7]  Pingfeng Zhong An IPv6 address lookup algorithm based on recursive balanced multi-way range trees with efficient search and update , 2011, 2011 International Conference on Computer Science and Service System (CSSS).

[8]  Viktor K. Prasanna,et al.  Multi-dimensional packet classification on FPGA: 100 Gbps and beyond , 2010, 2010 International Conference on Field-Programmable Technology.

[9]  George Varghese,et al.  Packet classification using multidimensional cutting , 2003, SIGCOMM '03.

[10]  Viktor K. Prasanna,et al.  Scalable Many-Field Packet Classification on Multi-core Processors , 2013, 2013 25th International Symposium on Computer Architecture and High Performance Computing.

[11]  Viktor K. Prasanna,et al.  High-performance architecture for dynamically updatable packet classification on FPGA , 2013, Architectures for Networking and Communications Systems.

[12]  Robert Ricci,et al.  Fast and flexible: Parallel packet processing with GPUs and click , 2013, Architectures for Networking and Communications Systems.

[13]  Viktor K. Prasanna,et al.  StrideBV: Single chip 400G+ packet classification , 2012, 2012 IEEE 13th International Conference on High Performance Switching and Routing.

[14]  Thomas Wild,et al.  Packet Processing at 100 Gbps and Beyond - Challenges and Perspectives , 2009 .