Area-Efficient Arithmetic Expression Evaluation Using Deeply Pipelined Floating-Point Cores

Recently, it has become possible to implement floating-point cores on field-programmable gate arrays (FPGAs) to provide acceleration for the myriad applications that require high-performance floating-point arithmetic. To achieve high clock rates, floating-point cores for FPGAs must be deeply pipelined. This deep pipelining makes it difficult to reuse the same floating-point core for a series of dependent computations. However, floating-point cores use a great deal of area, so it is important to use as few of them in an architecture as possible. In this paper, we describe area-efficient architectures and algorithms for arithmetic expression evaluation. Such expression evaluation is necessary in applications from a wide variety of fields, including scientific computing and cognition. The proposed designs effectively hide the pipeline latency of the floating-point cores and use at most two floating-point cores for each type of operator in the expression. While best-suited for particular classes of expressions, the proposed designs can evaluate general expressions as well. Additionally, multiple expressions can be evaluated without reconfiguration. Experimental results show that the areas of our designs increase linearly with the number of types of operations in the expression and that our designs occupy less area and achieve higher throughput than designs generated by a commercial hardware compiler.

[1]  Pierre G. Paulin,et al.  Force-Directed Scheduling in Automatic Data Path Synthesis , 1987, 24th ACM/IEEE Design Automation Conference.

[2]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[3]  Jaswinder Pal Singh,et al.  A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference , 1994, Proceedings of Supercomputing '94.

[4]  David J. Kuck,et al.  Time Bounds on the Parallel Evaluation of Arithmetic Expressions , 1975, SIAM J. Comput..

[5]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[6]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[7]  Daniel S. Poznanovic Application development on the SRC Computers, Inc. systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[8]  Hossam ElGindy,et al.  On Determining Polynomial Evaluation Structures for Fpga Based Custom Computing Machines , 1999 .

[9]  C. Siva Ram Murthy,et al.  Parallel Arithmetic Expression Evaluation on Reconfigurable Meshes , 1994, Comput. Lang..

[10]  C. Y. Roger Chen,et al.  Data path scheduling for two-level pipelining , 1991, 28th ACM/IEEE Design Automation Conference.

[11]  Nohbyung Park,et al.  SEHWA: A Program for Synthesis of Pipelines , 1986, 23rd ACM/IEEE Design Automation Conference.

[12]  David A. Bader,et al.  Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs) (Extended Abstract) , 2002, HiPC.

[13]  Rajiv Jain,et al.  Module selection for pipelined synthesis , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[14]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[15]  Gary L. Miller,et al.  Parallel tree contraction and its application , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[16]  Yoichi Muraoka,et al.  Bounds on the parallel evaluation of arithmetic expressions using associativity and commutativity , 1974, Acta Informatica.

[17]  Richard Cole,et al.  The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time , 2005, Algorithmica.

[18]  Stephen Neuendorffer,et al.  Combining module selection and resource sharing for efficient FPGA pipeline synthesis , 2006, FPGA '06.

[19]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.