SIMPO

While CPU architectures are incorporating many more cores to meet ever-bigger workloads, advance in fault-tolerance support is indispensable for sustaining system performance under reliability constraints. Emerging non-volatile memory technologies are yielding fast, dense, and energy-efficient NVRAM that can dethrone SSD drives for persisting data. Research on using NVRAM to enable fast in-memory data persistence is ongoing. In this work, we design and implement a persistent object framework, dubbed scalable in-memory persistent object (SIMPO), which exploits NVRAM, alongside DRAM, to support efficient object persistence in highly threaded big data applications. Based on operation logging, we propose a new programming model that classifies functions into instant and deferrable groups. SIMPO features a streamlined execution model, which allows lazy evaluation of deferrable functions and is well suited to big data computing workloads that would see improved data locality and concurrency. Our log recording and checkpointing scheme is effectively optimized towards NVRAM, mitigating its long write latency through write-combining and consolidated flushing techniques. Efficient persistent object management with features including safe references and memory leak prevention is also implemented and tailored to NVRAM. We evaluate a wide range of SIMPO-enabled applications with machine learning, high-performance computing, and database workloads on an emulated hybrid memory architecture and a real hybrid memory machine with NVDIMM. Compared with native applications without persistence, experimental results show that SIMPO incurs less than 5% runtime overhead on both platforms and even gains up to 2.5× speedup and 84% increase in throughput in highly threaded situations on the two platforms, respectively, thanks to the streamlined execution model.

[1]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[2]  Ellis Giles,et al.  Software Support for Atomicity and Persistence in Non-volatile Memory , 2013 .

[3]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[4]  Nir Shavit,et al.  Software transactional memory , 1997, Distributed Computing.

[5]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[6]  Youjip Won,et al.  NVWAL: Exploiting NVRAM in Write-Ahead Logging , 2016, ASPLOS.

[7]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[8]  Michael D. Smith,et al.  Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[9]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[10]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[11]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[12]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[13]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[14]  Ada Gavrilovska,et al.  pVM: persistent virtual memory for efficient capacity scaling and object storage , 2016, EuroSys.

[15]  Peter Henderson,et al.  A lazy evaluator , 1976, POPL.

[16]  Thomas Hérault,et al.  Performance and reliability trade-offs for the double checkpointing algorithm , 2014, Int. J. Netw. Comput..

[17]  Maurice Herlihy,et al.  Software transactional memory for dynamic-sized data structures , 2003, PODC '03.

[18]  William N. Scherer,et al.  Advanced contention management for dynamic software transactional memory , 2005, PODC '05.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  V. Nikitin,et al.  Non-volatile spin-transfer torque RAM (STT-RAM) , 2010, 68th Device Research Conference.

[21]  Jacob Stein,et al.  The GemStone object database management system , 1991, CACM.

[22]  Laxmikant V. Kalé,et al.  FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[23]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[24]  Rajesh K. Gupta,et al.  Onyx: A Prototype Phase Change Memory Storage Array , 2011, HotStorage.

[25]  Leslie Lamport,et al.  Proving the Correctness of Multiprocess Programs , 1977, IEEE Transactions on Software Engineering.

[26]  Y. Oyama,et al.  EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY , 1999 .

[27]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[28]  Jaemin Jung,et al.  HEAPO: Heap-Based Persistent Object Store , 2014, TOS.

[29]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[30]  Jack J. Dongarra,et al.  The LINPACK Benchmark: An Explanation , 1988, ICS.

[31]  Laxmikant V. Kalé,et al.  Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm , 2012, 2012 IEEE International Conference on Cluster Computing.

[32]  Luis Ceze,et al.  Operating System Implications of Fast, Cheap, Non-Volatile Memory , 2011, HotOS.

[33]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[34]  Hamid Pirahesh,et al.  A Transaction Model for an Open Publication Environment , 1991, Database Transaction Models for Advanced Applications.

[35]  Walter F. Tichy,et al.  Parallelizing Bzip2: A Case Study in Multicore Software Engineering , 2009, IEEE Software.

[36]  Amar Phanishayee,et al.  Atomic In-place Updates for Non-volatile Main Memories with Kamino-Tx , 2017, EuroSys.

[37]  Dhruva R. Chakrabarti,et al.  Implications of CPU Caching on Byte-addressable Non-Volatile Memory Programming , 2012 .

[38]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[39]  Song Jiang,et al.  Wormhole: A Fast Ordered Index for In-memory Data Management , 2018 .

[40]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[41]  Stratis Viglas,et al.  REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures , 2015, Proc. VLDB Endow..

[42]  Michael Stonebraker,et al.  Rethinking main memory OLTP recovery , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[43]  Shuichi Oikawa,et al.  Exploration of Non-volatile Memory Management in the OS Kernel , 2012, 2012 Third International Conference on Networking and Computing.

[44]  Karsten Schwan,et al.  NVRAM-aware Logging in Transaction Systems , 2014, Proc. VLDB Endow..

[45]  M.H. Kryder,et al.  After Hard Drives—What Comes Next? , 2009, IEEE Transactions on Magnetics.

[46]  Andy Rudoff,et al.  Persistent Memory: The Value to HPC and the Challenges , 2017, MCHPC@SC.