Reducing cache power with low-cost, multi-bit error-correcting codes

Technology advancements have enabled the integration of large on-die embedded DRAM (eDRAM) caches. eDRAM is significantly denser than traditional SRAMs, but must be periodically refreshed to retain data. Like SRAM, eDRAM is susceptible to device variations, which play a role in determining refresh time for eDRAM cells. Refresh power potentially represents a large fraction of overall system power, particularly during low-power states when the CPU is idle. Future designs need to reduce cache power without incurring the high cost of flushing cache data when entering low-power states. In this paper, we show the significant impact of variations on refresh time and cache power consumption for large eDRAM caches. We propose Hi-ECC, a technique that incorporates multi-bit error-correcting codes to significantly reduce refresh rate. Multi-bit error-correcting codes usually have a complex decoder design and high storage cost. Hi-ECC avoids the decoder complexity by using strong ECC codes to identify and disable sections of the cache with multi-bit failures, while providing efficient single-bit error correction for the common case. Hi-ECC includes additional optimizations that allow us to amortize the storage cost of the code over large data words, providing the benefit of multi-bit correction at same storage cost as a single-bit error-correcting (SECDED) code (2% overhead). Our proposal achieves a 93% reduction in refresh power vs. a baseline eDRAM cache without error correcting capability, and a 66% reduction in refresh power vs. a system using SECDED codes.

[1]  Wei Kong,et al.  Analysis of Retention Time Distribution of Embedded DRAM - A New Method to Characterize Across-Chip Threshold Voltage Variation , 2008, 2008 IEEE International Test Conference.

[2]  Eiji Fujiwara,et al.  Error-control coding for computer systems , 1989 .

[3]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[4]  D. Strukov,et al.  The area and latency tradeoffs of binary bit-parallel BCH decoders for prospective nanoelectronic memories , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[5]  Doe Hyun Yoon,et al.  Memory mapped ECC: low-cost error protection for last level caches , 2009, ISCA '09.

[6]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[7]  R. Chau,et al.  A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb-free Packaging , 2007, 2007 IEEE International Electron Devices Meeting.

[8]  Varghese George 45nm next generation Intel® Core™ microarchitecture (Penryn) , 2007, 2007 IEEE Hot Chips 19 Symposium (HCS).

[9]  Richard E. Matick,et al.  A 500 MHz Random Cycle, 1.5 ns Latency, SOI Embedded DRAM Macro Featuring a Three-Transistor Micro Sense Amplifier , 2008, IEEE Journal of Solid-State Circuits.

[10]  Mark Y. Liu,et al.  A 32nm logic technology featuring 2nd-generation high-k + metal-gate transistors, enhanced channel strain and 0.171μm2 SRAM cell size in a 291Mb array , 2008, 2008 IEEE International Electron Devices Meeting.

[11]  James L. Massey,et al.  Step-by-step decoding of the Bose-Chaudhuri- Hocquenghem codes , 1965, IEEE Trans. Inf. Theory.

[12]  Hai Zhou,et al.  Yield-Aware Cache Architectures , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  E. J. Weldon,et al.  Cyclic product codes , 1965, IEEE Trans. Inf. Theory.

[14]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  Fatih Hamzaoglu,et al.  Multi-Phase 1 GHz Voltage Doubler Charge Pump in 32 nm Logic Process , 2010, IEEE Journal of Solid-State Circuits.

[16]  Elwyn R. Berlekamp,et al.  Algebraic coding theory , 1984, McGraw-Hill series in systems science.

[17]  Ram Huggahalli,et al.  Impact of Cache Coherence Protocols on the Processing of Network Traffic , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[18]  Wei Chen,et al.  The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series , 2007, IEEE Journal of Solid-State Circuits.

[19]  Shu Lin,et al.  Error Control Coding , 2004 .

[20]  Daniel J. Costello,et al.  Error Control Coding, Second Edition , 2004 .

[21]  Kazuaki Murakami,et al.  Optimizing the DRAM refresh count for merged DRAM/logic LSIs , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[22]  Alon Naveh,et al.  Power and Thermal Management in the Intel Core Duo Processor , 2006 .

[23]  Douglas C. Bossen,et al.  Orthogonal Latin Square Configuration for LSI Memory Yield and Reliability Enhancement , 1975, IEEE Transactions on Computers.

[24]  B.C. Paul,et al.  Process variation in embedded memories: failure analysis and variation aware architecture , 2005, IEEE Journal of Solid-State Circuits.

[25]  Marios C. Papaefthymiou,et al.  Block-based multiperiod dynamic memory design for low data-retention power , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[26]  Philip G. Emma,et al.  Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications , 2008, IEEE Micro.

[27]  Richard E. Matick,et al.  Logic-based eDRAM: Origins and rationale for use , 2005, IBM J. Res. Dev..

[28]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[29]  Trevor N. Mudge,et al.  On-Chip Cache Device Scaling Limits and Effective Fault Repair Techniques in Future Nanoscale Technology , 2007, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007).

[30]  Robert T. Chien,et al.  Cyclic decoding procedures for Bose- Chaudhuri-Hocquenghem codes , 1964, IEEE Trans. Inf. Theory.

[31]  Fatih Hamzaoglu,et al.  Multi-phase 1GHz voltage doubler charge-pump in 32nm logic process , 2009, 2009 Symposium on VLSI Circuits.

[32]  Jose Renau,et al.  Effective Optimistic-Checker Tandem Core Design through Architectural Pruning , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[33]  T. Hamamoto,et al.  On the retention time distribution of dynamic random access memory (DRAM) , 1998 .

[34]  Andreas Curiger,et al.  On Computing Multiplicative Inverses in GF(2^m) , 1993, IEEE Trans. Computers.

[35]  Marios C. Papaefthymiou,et al.  Dynamic Memory Design for Low Data-Retention Power , 2000, PATMOS.