High Performance Computing

The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-in-Memory (PIM) architectures. Recent developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into massively parallel architectures. In this paper, we outline the salient features of PIM architectures and describe the design of an object-based programming and execution model centered on the notion of macroservers. While generally adhering to the conventional framework of object-based computation, macroservers provide special support for the efficient control of program execution in a PIM array. This includes features for specifying the distribution and alignment of data in virtual object space, the binding of threads to data, and a future-based synchronization mechanism. We provide a number of motivating examples and give a short overview of implementation considerations.

[1]  Nikitas A. Alexandridis,et al.  REDUCING SYSTEM FRAGMENTATION IN DYNAMICALLY PARTITIONABLE MESH-CONNECTED ARCHITECTURES , 1998 .

[2]  Keqin Li,et al.  Job Scheduling in a Partitionable Mesh Using a Two-Dimensional Buddy System Partitioning Scheme , 1991, IEEE Trans. Parallel Distributed Syst..

[3]  Howard Jay Siegel,et al.  Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.) , 1985 .

[4]  Prasant Mohapatra Processor Allocation Using Partitioning in Mesh Connected Parallel Computers , 1996, J. Parallel Distributed Comput..

[5]  Mohamed Ould-Khaoua A Performance Model for Duato's Fully Adaptive Routing Algorithm in k-Ary n-Cubes , 1999, IEEE Trans. Computers.

[6]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[7]  Hamid Sarbazi-Azad,et al.  An analytical model of fully-adaptive wormhole-routed k-ary n-cubes in the presence of hot spot traffic , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[8]  Hee Yong Youn,et al.  An Efficient Task Allocation Scheme for 2D Mesh Architectures , 1997, IEEE Trans. Parallel Distributed Syst..

[9]  Dhiraj K. Pradhan,et al.  Submesh Allocation in Mesh Multicomputers Using Busy-List: A BestFit Approach with Complete Recognition Capability , 1996, J. Parallel Distributed Comput..

[10]  Hyunsoo Yoon,et al.  On Submesh Allocation for Mesh Multicomputers: A Best-Fit Allocation and a Virtual Submesh Allocation for Faulty Meshes , 1998, IEEE Trans. Parallel Distributed Syst..

[11]  Nikitas A. Alexandridis,et al.  Efficient Processor Allocation Scheme with Task Embedding for Partitionable Mesh Architectures , 1998 .

[12]  Nikitas A. Alexandridis,et al.  A new “quad-tree-based” sub-system allocation technique for mesh-connected parallel machines , 1999, ICS '99.

[13]  Hamid Sarbazi-Azad,et al.  Performance analysis of k-ary n-cubes with fully adaptive routing , 2000, Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568).

[14]  Laxmi N. Bhuyan,et al.  An Adaptive Submesh Allocation Strategy for Two-Dimensional Mesh Connected Systems , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[15]  Nian-Feng Tzeng,et al.  An efficient submesh allocation strategy for mesh computer systems , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[16]  Dhiraj K. Pradhan,et al.  A fast and efficient strategy for submesh allocation in mesh-connected parallel computers , 1993, Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing.

[17]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[18]  Tong Liu,et al.  A Submesh Allocation Scheme for Mesh-Connected Multiprocessor Systems , 1995, ICPP.