Multi-Core Architecture on FPGA for Large Dictionary String Matching

FPGA has long been considered an attractive platform for high performance implementations of string matching. However, as the size of pattern dictionaries continues to grow, such large dictionaries can be stored in external DRAM only. The increased memory latency and limited bandwidth pose new challenges to FPGA-based designs, and the lack of spatial and temporal locality in data access also leads to low utilization of memory bandwidth. In this paper, we propose a multi-core architecture on FPGA to address these challenges. We adopt the popular Aho-Corasick (AC-opt) algorithm for our string matching engine. Utilizing the data access feature in this algorithm, we design a specialized BRAM buffer for the cores to exploit a data reuse existing in such applications. Several design optimization techniques are utilized to realize a simple design with high clock rate for the string matching engine. An implementation of a 2-core system with one shared BRAM buffer on a Virtex-5 LX155 achieves up to 3.2 Gbps throughput on a 64 MB state transition table stored in DRAM. Performance of systems with more cores is also evaluated for this architecture, and a throughput of over 5.5 Gbps can be obtained for some application scenarios.

[1]  Evangelos P. Markatos,et al.  Performance analysis of content matching intrusion detection systems , 2004, 2004 International Symposium on Applications and the Internet. Proceedings..

[2]  Bruce W. Watson,et al.  The performance of single-keyword and multiple-keyword pattern matching algorithms , 1994 .

[3]  Viktor K. Prasanna,et al.  A computationally efficient engine for flexible intrusion detection , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM.

[5]  J. Gregory Steffan,et al.  Scaling Soft Processor Systems , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[6]  Fabrizio Petrini,et al.  Exact multi-pattern string matching on the cell/b.e. processor , 2008, CF '08.

[7]  Dhiraj K. Pradhan,et al.  A Routing-Aware ILS Design Technique , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[9]  Chita R. Das,et al.  Memory-efficient content filtering hardware for high-speed intrusion detection systems , 2007, SAC '07.

[10]  Viktor K. Prasanna,et al.  A SRAM-based Architecture for Trie-based IP Lookup Using FPGA , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[11]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[12]  Kurt Keutzer,et al.  An FPGA-based soft multiprocessor system for IPv4 packet forwarding , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[13]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[14]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.