Robust and Scalable String Pattern Matching for Deep Packet Inspection on Multicore Processors

Conventionally, dictionary-based string pattern matching (SPM) has been implemented as Aho-Corasick deterministic finite automaton (AC-DFA). Due to its large memory footprint, a large-dictionary AC-DFA can experience poor cache performance when matching against inputs with high match ratio on multicore processors. We propose a head-body finite automaton (HBFA), which implements SPM in two parts: a head DFA (H-DFA) and a body NFA (B-NFA). The H-DFA matches the dictionary up to a predefined prefix length in the same way as AC-DFA, but with a much smaller memory footprint. The B-NFA extends the matching to full dictionary lengths in a compact variable-stride branch data structure, accelerated by single-instruction multiple-data (SIMD) operations. A branch grafting mechanism is proposed to opportunistically advance the state of the H-DFA with the matching progress in the B-NFA. Compared with a fully populated AC-DFA, our HBFA prototype has <;1/5 construction time, requires <;1/20 runtime memory, and achieves 3x to 8x throughput when matching real-life large dictionaries against inputs with high match ratios. The throughput scales up 27x to over 34 Gbps on a 32-core Intel Manycore Testing Lab machine based on the Intel Xeon X7560 processors.

[1]  Viktor K. Prasanna,et al.  Scalable multi-pipeline architecture for high performance multi-pattern string matching , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[2]  Fabrizio Petrini,et al.  High-speed string searching against large dictionaries on the Cell/B.E. Processor , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[3]  Evangelos P. Markatos,et al.  Generating realistic workloads for network intrusion detection systems , 2004, WOSP '04.

[4]  Sotiris Ioannidis,et al.  GrAVity: A Massively Parallel Antivirus Engine , 2010, RAID.

[5]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[6]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[7]  Evangelos P. Markatos,et al.  Performance analysis of content matching intrusion detection systems , 2004, 2004 International Symposium on Applications and the Internet. Proceedings..

[8]  Michela Becchi,et al.  Evaluating regular expression matching engines on network and general purpose processors , 2009, ANCS '09.

[9]  Timothy Sherwood,et al.  Architectures for Bit-Split String Scanning in Intrusion Detection , 2006, IEEE Micro.

[10]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[11]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.