Improving cache lifetime reliability at ultra-low voltages

Voltage scaling is one of the most effective mechanisms to reduce microprocessor power consumption. However, the increased severity of manufacturing-induced parameter variations at lower voltages limits voltage scaling to a minimum voltage, Vccmin, below which a processor cannot operate reliably. Memory cell failures in large memory structures (e.g., caches) typically determine the Vccmin for the whole processor. Memory failures can be persistent (i.e., failures at time zero which cause yield loss) or non-persistent (e.g., soft errors or erratic bit failures). Both types of failures increase as supply voltage decreases and both need to be addressed to achieve reliable operation at low voltages. In this paper, we propose a novel adaptive technique to improve cache lifetime reliability and enable low voltage operation. This technique, multi-bit segmented ECC (MS-ECC) addresses both persistent and non-persistent failures. Like previous work on mitigating persistent failures, MS-ECC trades off cache capacity for lower voltages. However, unlike previous schemes, MS-ECC does not rely on testing to identify and isolate defective bits, and therefore enables error tolerance for non-persistent failures like erratic bits and soft errors at low voltages. Furthermore, MS-ECC's design can allow the operating system to adaptively change the cache size and ECC capability to adjust to system operating conditions. Compared to current designs with single-bit correction, the most aggressive implementation for MS-ECC enables a 30% reduction in supply voltage, reducing power by 71% and energy per instruction by 42%.

[1]  M.A. Horowitz,et al.  A 50% noise reduction interface using low-weight coding , 1996, 1996 Symposium on VLSI Circuits. Digest of Technical Papers.

[2]  K. Roy,et al.  A 160 mV Robust Schmitt Trigger Based Subthreshold SRAM , 2007, IEEE Journal of Solid-State Circuits.

[3]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[4]  J. Jopling,et al.  Erratic fluctuations of sram cache vmin at the 90nm process technology node , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[5]  Mary Jane Irwin,et al.  Neutron-induced soft error rate measurements in semiconductor memories , 2007 .

[6]  Dwijendra K. Ray-Chaudhuri,et al.  Binary mixture flow with free energy lattice Boltzmann methods , 2022, arXiv.org.

[7]  Pradip Bose,et al.  Scaling of Architecture Level Soft Error Rate for Superscalar Processors ∗ , 2005 .

[8]  K. Kumagai,et al.  Investigation of soft error rate including multi-bit upsets in advanced SRAM using neutron irradiation test and 3D mixed-mode device simulation , 2004, IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004..

[9]  Daniel J. Costello,et al.  Error Control Coding, Second Edition , 2004 .

[10]  D. C. Bossen,et al.  Orthogonal latin square codes , 1970 .

[11]  Changhong Dai,et al.  Impact of CMOS process scaling and SOI on the soft error rates of logic processes , 2001, 2001 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.01 CH37184).

[12]  K. Soumyanath,et al.  Impact of body bias on alpha- and neutron-induced soft error rates of flip-flops , 2004, 2004 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.04CH37525).

[13]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[16]  Wei Liu,et al.  Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash Memories , 2006, 2006 IEEE Workshop on Signal Processing Systems Design and Implementation.

[17]  S. E. Schuster Multiple word/bit line redundancy for semiconductor memories , 1978 .

[18]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[19]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[20]  Trevor Mudge,et al.  On-Chip Cache Device Scaling Limits and Effective Fault Repair Techniques in Future Nanoscale Technology , 2007 .

[21]  cristian. constantinescu Impact of Intermittent Faults on Nanocomputing Devices , 2007 .

[22]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[23]  J. Meindl,et al.  The impact of intrinsic device fluctuations on CMOS SRAM cell stability , 2001, IEEE J. Solid State Circuits.

[24]  Huntington W. Curtis,et al.  Accelerated testing for cosmic soft-error rate , 1996, IBM J. Res. Dev..

[25]  Young-Hyun Jun,et al.  An 80nm 4Gb/s/pin 32b 512Mb GDDR4 Graphics DRAM with Low-Power and Low-Noise Data-Bus Inversion , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[26]  Yuan Taur,et al.  Fundamentals of Modern VLSI Devices , 1998 .

[27]  Diana Marculescu,et al.  Soft error rate reduction using redundancy addition and removal , 2008, 2008 Asia and South Pacific Design Automation Conference.

[28]  Jack Doweck,et al.  Inside Intel® Core microarchitecture , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).

[29]  Shu Lin,et al.  Error Control Coding , 2004 .

[30]  Eisse Mensink,et al.  A Double-Tail Latch-Type Voltage Sense Amplifier with 18ps Setup+Hold Time , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[31]  Georg Georgakos,et al.  Soft Error Rates in 65nm SRAMs--Analysis of new Phenomena , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[32]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.