Area-Efficient Evaluation of a Class of Arithmetic Expressions Using Deeply Pipelined Floating-Point Cores
暂无分享,去创建一个
[1] Viktor K. Prasanna,et al. Efficient Floating-point Based Block LU Decomposition on FPGAs , 2004, ERSA.
[2] Joseph JáJá,et al. An Introduction to Parallel Algorithms , 1992 .
[3] Gary L. Miller,et al. Parallel tree contraction and its application , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).
[4] Viktor K. Prasanna,et al. Designing scalable FPGA-based reduction circuits using pipelined floating-point cores , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[5] Karl S. Hemmert,et al. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[6] Margaret Martonosi,et al. Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques , 2000, IEEE Trans. Computers.
[7] Richard Cole,et al. The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time , 2005, Algorithmica.
[8] Srinivas Aluru,et al. Parallel algorithms for tree accumulations , 2005, J. Parallel Distributed Comput..
[9] Viktor K. Prasanna,et al. Design tradeoffs for BLAS operations on reconfigurable hardware , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[10] David A. Bader,et al. Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs) (Extended Abstract) , 2002, HiPC.
[11] Viktor K. Prasanna,et al. Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.
[12] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.
[13] Viktor K. Prasanna,et al. Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware , 2004, ERSA.
[14] C. Siva Ram Murthy,et al. Parallel Arithmetic Expression Evaluation on Reconfigurable Meshes , 1994, Comput. Lang..
[15] Wojciech Rytter,et al. Optimal Parallel Algorithm for Dynamic Expression Evaluation and Context-Free Recognition , 1989, Inf. Comput..
[16] Maya Gokhale,et al. A Preliminary Study of Molecular Dynamics on Reconfigurable Computers , 2003, Engineering of Reconfigurable Systems and Algorithms.
[17] Jaswinder Pal Singh,et al. A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference , 1994, Proceedings of Supercomputing '94.
[18] Reinhard Männer,et al. Using floating-point arithmetic on FPGAs to accelerate scientific N-Body simulations , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[19] Viktor K. Prasanna,et al. High-performance FPGA-based general reduction methods , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).
[20] Viktor K. Prasanna,et al. Cache-friendly implementations of transitive closure , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[21] Viktor K. Prasanna,et al. Scalable and modular algorithms for floating-point matrix multiplication on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..