Energy-efficient cache design using variable-strength error-correcting codes

Voltage scaling is one of the most effective mechanisms to improve microprocessors' energy efficiency. However, processors cannot operate reliably below a minimum voltage, Vccmin, since hardware structures may fail. Cell failures in large memory arrays (e.g., caches) typically determine Vccmin for the whole processor. We observe that most cache lines exhibit zero or one failures at low voltages. However, a few lines, especially in large caches, exhibit multi-bit failures and increase Vccmin. Previous solutions either significantly reduce cache capacity to enable uniform error correction across all lines, or significantly increase latency and bandwidth overheads when amortizing the cost of error-correcting codes (ECC) over large lines. In this paper, we propose a novel cache architecture that uses variable-strength error-correcting codes (VS-ECC). In the common case, lines with zero or one failures use a simple and fast ECC. A small number of lines with multi-bit failures use a strong multi-bit ECC that requires some additional area and latency. We present a novel dynamic cache characterization mechanism to determine which lines will exhibit multi-bit failures. In particular, we use multi-bit correction to protect a fraction of the cache after switching to low voltage, while dynamically testing the remaining lines for multi-bit failures. Compared to prior multi-bit-correcting proposals, VS-ECC significantly reduces power and energy, avoids significant reductions in cache capacity, incurs little area overhead, and avoids large increases in latency and bandwidth.

[1]  Robert T. Chien,et al.  Cyclic decoding procedures for Bose- Chaudhuri-Hocquenghem codes , 1964, IEEE Trans. Inf. Theory.

[2]  James L. Massey,et al.  Step-by-step decoding of the Bose-Chaudhuri- Hocquenghem codes , 1965, IEEE Trans. Inf. Theory.

[3]  Elwyn R. Berlekamp,et al.  Algebraic coding theory , 1984, McGraw-Hill series in systems science.

[4]  Herbert O. Burton Inversionless decoding of binary BCH codes , 1971, IEEE Trans. Inf. Theory.

[5]  S. E. Schuster Multiple word/bit line redundancy for semiconductor memories , 1978 .

[6]  D.P. Siewiorek,et al.  Testing of digital systems , 1981, Proceedings of the IEEE.

[7]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[8]  Eiji Fujiwara,et al.  Error-control coding for computer systems , 1989 .

[9]  Trieu-Kien Truong,et al.  VLSI design of inverse-free Berlekamp-Massey algorithm , 1991 .

[10]  Andreas Curiger,et al.  On Computing Multiplicative Inverses in GF(2^m) , 1993, IEEE Trans. Computers.

[11]  T. Matsushima,et al.  Parallel Encoder and Decoder Architecture for Cyclic Codes , 1996 .

[12]  A. J. van de Goor,et al.  Testing Semiconductor Memories: Theory and Practice , 1998 .

[13]  Arun K. Somani,et al.  Area efficient architectures for information integrity in cache memories , 1999, ISCA.

[14]  Kevin Reick,et al.  Power4 System Design for High Reliability , 2002, IEEE Micro.

[15]  Niraj K. Jha,et al.  Testing of Digital Systems , 2003 .

[16]  Stefan Rusu,et al.  Itanium 2 processor 6M: higher frequency and larger L3 cache , 2004, IEEE Micro.

[17]  Kaushik Roy,et al.  Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  D. Strukov,et al.  The area and latency tradeoffs of binary bit-parallel BCH decoders for prospective nanoelectronic memories , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[19]  Eiji Fujiwara Code Design for Dependable Systems: Theory and Practical Applications , 2006 .

[20]  藤原 英二,et al.  Code design for dependable systems : theory and practical applications , 2006 .

[21]  T. Mudge,et al.  On-Chip Cache Device Scaling Limits and Effective Fault Repair Techniques in Future Nanoscale Technology , 2007, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007).

[22]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[23]  K. Roy,et al.  A 160 mV Robust Schmitt Trigger Based Subthreshold SRAM , 2007, IEEE Journal of Solid-State Circuits.

[24]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[25]  Wei Wu,et al.  Improving cache lifetime reliability at ultra-low voltages , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Amin Ansari,et al.  ZerehCache: Armoring cache architectures in high defect density technologies , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Doe Hyun Yoon,et al.  Memory mapped ECC: low-cost error protection for last level caches , 2009, ISCA '09.

[28]  Jaume Abella,et al.  Low Vccmin fault-tolerant cache with highly predictable performance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Wei Wu,et al.  Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.