Data Management in the Cloud

We are in the midst of a computing revolution. As the cost of provisioning hardware and software stacks grows, and the cost of securing and administering these complex systems grows even faster, we're seeing a shift towards computing clouds. Clouds are essentially services accessed over a network, and offer developers scalable, robust computing infrastructure on a "pay as you go" basis, with the ability to dynamically adjust the amount of "rented" resources, and thereby, the bill. For cloud service providers, there is efficiency from amortizing costs and averaging usage peaks. Internet portals like Yahoo! have long offered application services, such as email for individuals and organizations. Companies are now offering services such as storage and compute cycles, enabling higher-level services to be built on top. In this talk, I will discuss Yahoo!'s vision of cloud computing, and describe some of the key initiatives, highlighting the technical challenges involved in designing hosted, multi-tenanted data management systems.

[1]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[2]  Werner Vogels,et al.  Data Access Patterns in The Amazon.com Technology Platform , 2007, VLDB.

[3]  Dean Jacobs,et al.  Ruminations on Multi-Tenant Databases , 2007, BTW.

[4]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[5]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[6]  Christos H. Papadimitriou,et al.  The serializability of concurrent database updates , 1979, JACM.

[7]  Bengt Carlsson,et al.  The Rise and Fall of Napster - An Evolutionary Approach , 2001, Active Media Technology.

[8]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[9]  Hector Garcia-Molina,et al.  Elections in a Distributed Computing System , 1982, IEEE Transactions on Computers.

[10]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[11]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[12]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[13]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[14]  Philip A. Bernstein,et al.  Implementing an Append-Only Interface for Semiconductor Storage , 2010, IEEE Data Eng. Bull..

[15]  Yun Chi,et al.  CloudDB: One Size Fits All Revived , 2010, 2010 6th World Congress on Services.

[16]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[17]  Divyakant Agrawal,et al.  Live Database Migration for Elasticity in a Multitenant Database for Cloud Platforms , 2010 .

[18]  Michael Stonebraker,et al.  A Formal Model of Crash Recovery in a Distributed System , 1983, IEEE Transactions on Software Engineering.

[19]  Gerhard Weikum,et al.  Unbundling Transaction Services in the Cloud , 2009, CIDR.

[20]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[21]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[22]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[23]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[24]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[25]  Torsten Grust,et al.  Multi-tenant databases for software as a service: schema-mapping techniques , 2008, SIGMOD Conference.

[26]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[27]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[28]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1983, PODC '83.

[29]  Mamoru Maekawa,et al.  A N algorithm for mutual exclusion in decentralized systems , 1985, TOCS.

[30]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[31]  Ernest J. H. Chang,et al.  An improved algorithm for decentralized extrema-finding in circular configurations of processes , 1979, CACM.

[32]  Tim Kraska,et al.  Building a database on S3 , 2008, SIGMOD Conference.

[33]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[34]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[35]  Philip A. Bernstein,et al.  Principles of Transaction Processing , 1996 .

[36]  Hamid Pirahesh,et al.  A Transaction Model for an Open Publication Environment , 1991, Database Transaction Models for Advanced Applications.

[37]  Chandra Krintz,et al.  AppScale: Scalable and Open AppEngine Application Development and Deployment , 2009, CloudComp.

[38]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[39]  Ashraf Aboulnaga,et al.  Automatic virtual machine configuration for database workloads , 2008, SIGMOD Conference.

[40]  Michael Stonebraker,et al.  One Size Fits All? Part 2: Benchmarking Studies , 2007, CIDR.

[41]  Michael A. Duggan,et al.  Data bases , 1970, ACM '70.

[42]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[43]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[44]  Amr El Abbadi,et al.  ElasTraS: An Elastic Transactional Data Store in the Cloud , 2009, HotCloud.

[45]  Arnold L. Rosenberg,et al.  Application Placement on a Cluster of Servers , 2007, Int. J. Found. Comput. Sci..

[46]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[47]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[48]  Mohamed F. Mokbel,et al.  Deuteronomy: Transaction Support for Cloud Data , 2011, CIDR.

[49]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[50]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[51]  Armando Fox,et al.  HiLighter: Automatically Building Robust Signatures of Performance Behavior for Small- and Large-Scale Systems , 2008, SysML.

[52]  Dorit S. Hochbaum,et al.  Polynomial algorithm for the k-cut problem , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[53]  Shyam Antony,et al.  Data Management Challenges in Cloud Computing Infrastructures , 2010, DNIS.

[54]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[55]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[56]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[57]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[58]  Gustavo Alonso,et al.  Consistency Rationing in the Cloud: Pay only when it matters , 2009, Proc. VLDB Endow..

[59]  Beng Chin Ooi,et al.  Towards elastic transactional cloud storage with range query support , 2010, Proc. VLDB Endow..

[60]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.

[61]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[62]  Luis Rodero-Merino,et al.  A break in the clouds: towards a cloud definition , 2008, CCRV.

[63]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[64]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[65]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[66]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[67]  Henri E. Bal,et al.  An efficient reliable broadcast protocol , 1989, OPSR.

[68]  Divyakant Agrawal,et al.  G-Store: a scalable data store for transactional multi key access in the cloud , 2010, SoCC '10.