Design space exploration using arithmetic-level hardware--software cosimulation for configurable multiprocessor platforms

Configurable multiprocessor platforms consist of multiple soft processors configured on FPGA devices. They have become an attractive choice for implementing many computing applications. In addition to the various ways of distributing software execution among the multiple soft processors, the application designer can customize soft processors and the connections between them in order to improve the performance of the applications running on the multiprocessor platform. State-of-the-art design tools rely on low-level simulation to explore the various design trade-offs offered by configurable multiprocessor platforms. These low-level simulation based exploration techniques are too time-consuming and can be a major bottleneck to efficient design space exploration on these platforms. We propose a design space exploration technique for configurable multiprocessor platforms using arithmetic-level cycle-accurate hardware--software cosimulation. Arithmetic-level abstractions of the hardware and software execution platforms are created within the proposed cosimulation environment. The configurable multiprocessor platforms are described using these arithmetic-level abstractions. Hardware and software simulators are tightly integrated to concurrently simulate the arithmetic behavior of the multiprocessor platform. The simulation within the integrated simulators are synchronized to provide cycle-accurate simulation results for the complete multiprocessor platform. By doing so, we significantly speed up the cosimulation process for configurable multiprocessor platforms. Exploration of the various hardware-software design trade-offs provided by configurable multiprocessor platforms can be performed within the proposed cycle-accurate cosimulation environment. After the final designs are identified, the corresponding low-level implementations with the desired cycle-accurate arithmetic behavior are generated automatically. For illustrative purposes, we provide an implementation of our approach based on MATLAB/Simulink. We show the cosimulation of two numerical computation applications and one image-processing application on a popular configurable multiprocessor platform within the MATLAB/Simulink-based cosimulation environment. For these three applications, our arithmetic-level cosimulation approach leads to speed-ups in simulation time of up to more than 800x compared with the low-level simulation approaches. The designs of these applications identified using our arithmetic-level cosimulation approach achieve execution time speed-ups up to 5.6x, compared with other designs considered in our experiments.

[1]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[2]  Krishna V. Palem,et al.  Compiler Optimizations for Adaptive EPIC Processors , 2001, EMSOFT.

[3]  Viktor K. Prasanna,et al.  Rapid energy estimation of computations on FPGA based soft processors , 2004, IEEE International SOC Conference, 2004. Proceedings..

[4]  Luciano Lavagno,et al.  Hardware-software co-design of embedded systems: the POLIS approach , 1997 .

[5]  Scott McMillan,et al.  A System Level Resource Estimation Tool for FPGAs , 2004, FPL.

[6]  Kurt Keutzer,et al.  An FPGA-based soft multiprocessor system for IPv4 packet forwarding , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[7]  Paul R. Schumacher,et al.  A single program multiple data parallel processing platform for FPGAs , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[8]  Jason Cong,et al.  Instruction set extension with shadow registers for configurable processors , 2005, FPGA '05.

[9]  Jürgen Becker,et al.  Hardware/software co-design for data-driven Xputer-based accelerators , 1997, Proceedings Tenth International Conference on VLSI Design.

[10]  Gregory W. Cook,et al.  An Investigation of Scalable SIMD I/O Techniques with Application to Parallel JPEG Compression , 1995, J. Parallel Distributed Comput..

[11]  Alex K. Jones,et al.  An FPGA-based VLIW processor with custom hardware execution , 2005, FPGA '05.

[12]  Ray Andraka,et al.  A survey of CORDIC algorithms for FPGA based computers , 1998, FPGA '98.

[13]  Graham M. Megson,et al.  Engineering of Reconfigurable Hardware/Software Objects , 2004, The Journal of Supercomputing.

[14]  John J. Granacki,et al.  DEFACTO: A Design Environment for Adaptive Computing Technology , 1999, IPPS/SPDP Workshops.

[15]  Viktor K. Prasanna,et al.  PyGen: a MATLAB/Simulink based tool for synthesizing parameterized and energy efficient designs using FPGAs , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.