Towards a federated Metropolitan Area Grid environment: The SCoPE network-aware infrastructure

Grid computing offers us an effective approach, infrastructure and trend for coordinated resource sharing, problem solving and service integration into dynamic, multi-institutional, virtual organizations often spanning several distant sites in a large urban or regional area. In this paper, we discuss opportunities and challenges in the Grid resource management infrastructure and network control plane design, critical to the provision of network-assisted extensible Grid services on the metropolitan scale. Such services can empower a real high performance distributed computing system built on optical transport networks, administered within a single domain and offering plenty of cheap bandwidth to e-science applications. This approach makes the transport infrastructure the main enabling factor of a novel Grid vision, the ''Metropolitan Area Grid'' (MAG), aiming at unifying many geographically distributed federated computational and storage resources into a common ''virtual site'' abstraction, so that they can cooperate as if they were in the same Server Farm and Local Area Network. Simply stated, the MAG concept aims to make applications running on our metro Grid infrastructure aware of their complete computational and networking environment and capabilities, and able to make dynamic, adaptive and optimized use of heterogeneous network infrastructures connecting various high-end resources. As a proof of concept, we realized within the SCoPE High Performance Computing environment the prototype of a basic MAG architecture by implementing a novel centralized network resource management service supporting a flexible Grid-application interface and several effective network resource reservation facilities.

[1]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[2]  Francesco Palmieri,et al.  GMPLS-based service differentiation for scalable QoS support in all-optical Grid applications , 2006, Future Gener. Comput. Syst..

[3]  Anja Feldmann,et al.  Live wide-area migration of virtual machines including local persistent state , 2007, VEE '07.

[4]  Eric C. Rosen,et al.  Multiprotocol Label Switching Architecture , 2001, RFC.

[5]  Alexander Stage,et al.  Network-aware migration control and scheduling of differentiated virtual machine workloads , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.

[6]  Laurence Field,et al.  GStat 2.0: Grid Information System Status Monitoring , 2010 .

[7]  Augusto Ciuffoletti,et al.  Architecture of monitoring elements for the network element modeling in a Grid infrastructure , 2003, ArXiv.

[8]  Stefano Dal Pra,et al.  GridICE: monitoring the user/application activities on the grid , 2008 .

[9]  Ram Dantu,et al.  Constraint-Based LSP Setup using LDP , 2002, RFC.

[10]  Stephen Pickles,et al.  Mini-Grids: Effective Test-Beds for GRID Application , 2000, GRID.

[11]  David L. Black,et al.  An Architecture for Differentiated Service , 1998 .

[12]  Leon Gommans,et al.  Seamless live migration of virtual machines over the MAN/WAN , 2006, Future Gener. Comput. Syst..

[13]  Eric C. Rosen,et al.  Encapsulation Methods for Transport of Layer 2 Frames over MPLS Networks , 2007, RFC.

[14]  Fei Yeh,et al.  Distributed optical testbed (DOT): a grid applications and optical communications testbed , 2005, 2nd International Conference on Broadband Networks, 2005..

[15]  Harvey B. Newman,et al.  The DataTAG transatlantic testbed , 2005, Future Gener. Comput. Syst..

[16]  Francesco Palmieri,et al.  Network-aware scheduling for real-time execution support in data-intensive optical Grids , 2009, Future Gener. Comput. Syst..

[17]  Robert L. Grossman,et al.  Compute and storage clouds using wide area high performance networks , 2008, Future Gener. Comput. Syst..

[18]  Robert L. Grossman,et al.  The Open Cloud Testbed: A Wide Area Testbed for Cloud Computing Utilizing High Performance Network Services , 2009, ArXiv.

[19]  Alexander S. Szalay,et al.  The importance of data locality in distributed computing applications , 2006 .

[20]  Xiaodong Liu,et al.  Performance Modeling and Analysis for Centralized Resource Scheduling in Metropolitan-Area Grids , 2006, APWeb Workshops.

[21]  Jeremiah P. Ostriker,et al.  Cosmology of the early universe viewed through the new infrastructure , 1997, CACM.