Optimizing Regular Expression Matching with SR-NFA on Multi-Core Systems

Conventionally, regular expression matching (REM) has been performed by sequentially comparing the regular expression (regex) to the input stream, which can be slow due to excessive backtracking (smith:acsac06). Alternatively, the regex can be converted to a deterministic finite automaton (DFA) for efficient matching, which however may require an extremely large state transition table (STT) due to exponential state explosion (meyer:swat71, yu:ancs06). We propose the segmented regex-NFA (SR-NFA) architecture, where the regex is first compiled into modular nondeterministic finite automata (NFA), then partitioned, optimized, and matched efficiently on modern multi-core processors. SR-NFA offers attack-resilient multi-gigabit per second matching throughput, does not suffer from either backtracking or state explosion, and can be rapidly constructed. For regex sets that construct a DFA with moderate state explosion, i.e., on average 200k states in the STT, the proposed SR-NFA is 367k times faster to construct and update and use 23k times less memory than the DFA approach. Running on an 8-core 2.6 GHz Opteron platform, our prototype achieves 2.2 Gbps average matching throughput for regex sets with up to 4,000 SR-NFA states per regex set.

[1]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[2]  George Varghese,et al.  Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia , 2007, ANCS '07.

[3]  Sotiris Ioannidis,et al.  Regular Expression Matching on Graphics Hardware for Intrusion Detection , 2009, RAID.

[4]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[5]  Patrick Crowley,et al.  A hybrid finite automaton for practical deep packet inspection , 2007, CoNEXT '07.

[6]  Viktor K. Prasanna,et al.  Compact architecture for high-throughput regular expression matching on FPGA , 2008, ANCS '08.

[7]  A. R. Meyer,et al.  Economy of Description by Automata, Grammars, and Formal Systems , 1971, SWAT.

[8]  Robert McNaughton,et al.  Regular Expressions and State Graphs for Automata , 1960, IRE Trans. Electron. Comput..

[9]  Fabrizio Petrini,et al.  Exact multi-pattern string matching on the cell/b.e. processor , 2008, CF '08.

[10]  Josef Grosch Efficient generation of lexical analysers , 1989, Softw. Pract. Exp..

[11]  Cheng-Hung Lin,et al.  Optimization of Regular Expression Pattern Matching Circuits on FPGA , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[12]  Viktor K. Prasanna,et al.  Software Toolchain for Large-Scale RE-NFA Construction on FPGA , 2009, Int. J. Reconfigurable Comput..

[13]  Tsern-Huei Lee Generalized Aho-Corasick Algorithm for Signature Based Anti-Virus Applications , 2007, 2007 16th International Conference on Computer Communications and Networks.

[14]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[15]  Patrick Crowley,et al.  An improved algorithm to accelerate regular expression evaluation , 2007, ANCS '07.

[16]  Viktor K. Prasanna,et al.  Fast Regular Expression Matching Using FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[17]  Cheng-Hung Lin,et al.  Optimization of Pattern Matching Circuits for Regular Expression on FPGA , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Fabrizio Petrini,et al.  Tools for Very Fast Regular Expression Matching , 2010, Computer.

[19]  Norio Yamagaki,et al.  High-speed regular expression matching engine using multi-character NFA , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[20]  Brad L. Hutchings,et al.  Assisting network intrusion detection with reconfigurable hardware , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[21]  Stamatis Vassiliadis,et al.  Regular expression matching for reconfigurable packet inspection , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[22]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[23]  Jeffrey D. Ullman,et al.  The compilation of regular expressions into integrated circuits , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[24]  Somesh Jha,et al.  Backtracking Algorithmic Complexity Attacks against a NIDS , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).