Scalable high-throughput architecture for large balanced tree structures on FPGA (abstract only)

Architectures for tree structures on FPGAs as well as ASICs have been proposed over the years. The exponential growth in the memory size with respect to the tree levels restricts the scalability of these architectures due to limited on-chip memory. For large trees, off-chip memory has to be used. We propose a pipeline architecture on FPGA for large balanced tree structures which achieves both scalability and high throughput. In the proposed architecture, each tree level is mapped onto a single or multiple Processing Elements (PEs) using dual-port distributed RAM, dual-port block RAM and off-chip RAM. We parameterize the pipeline architecture and optimize the performance with respect to scalability and throughput. The resulting architecture for the search tree is dual-threaded and deeply pipelined. It can accept two search requests per clock cycle and operates at a high clock rate of 280MHz. Post place-and-route results show that, by using only 17% of the logic resources and 9% of the BRAM available on a state-of-the-art FPGA, our dual-thread pipelined search tree can perform 560 million search operations per second in a tree containing 512K 64-bit keys.

[1]  Timothy Sherwood,et al.  Modeling TCAM power for next generation network devices , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[2]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .

[3]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[4]  Pingfeng Zhong An IPv6 address lookup algorithm based on recursive balanced multi-way range trees with efficient search and update , 2011, 2011 International Conference on Computer Science and Service System (CSSS).

[5]  Ioannis Sourdis,et al.  Longest Prefix Match and updates in Range Tries , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[6]  Qin Wang,et al.  A new full adder design for tree structured arithmetic circuits , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[7]  Edward A. Lee,et al.  PRET DRAM controller: Bank privatization for predictability and temporal isolation , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[8]  Oliver Chiu-sing Choy,et al.  Architecture and Design Flow for a Highly Efficient Structured ASIC , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  George Varghese,et al.  Multiway range trees: scalable IP lookup with fast updates , 2004, Comput. Networks.

[10]  Tor M. Aamodt,et al.  A Hybrid Analytical DRAM Performance Model , 2011 .

[11]  Bin Liu,et al.  A TCAM-based distributed parallel IP lookup scheme and performance analysis , 2006, IEEE/ACM Transactions on Networking.

[12]  Paolo Ienne,et al.  Exploiting fast carry-chains of FPGAs for designing compressor trees , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[13]  Frank Vahid,et al.  Don't forget memories: a case study redesigning a pattern counting ASIC circuit for FPGAs , 2008, CODES+ISSS '08.

[14]  Sartaj Sahni,et al.  A B-tree dynamic router-table design , 2005, IEEE Transactions on Computers.

[15]  Viktor K. Prasanna,et al.  Scalable Tree-Based Architectures for IPv4/v6 Lookup Using Prefix Partitioning , 2012, IEEE Transactions on Computers.

[16]  Rajeev Murgai,et al.  Delay estimation and optimization of logic circuits: a survey , 1997, Proceedings of ASP-DAC '97: Asia and South Pacific Design Automation Conference.

[17]  Otmane Aït Mohamed,et al.  A Comparative Study of Parallel Prefix Adders in FPGA Implementation of EAC , 2009, 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools.

[18]  Butler W. Lampson,et al.  IP lookups using multiway and multicolumn search , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[19]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[20]  H. Jonathan Chao,et al.  FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps , 2010, 2010 Proceedings IEEE INFOCOM.

[21]  Viktor K. Prasanna,et al.  Scalable high-throughput SRAM-based architecture for IP-lookup using FPGA , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[22]  Viktor K. Prasanna,et al.  High throughput and large capacity pipelined dynamic search tree on FPGA , 2010, FPGA '10.