Improving the reliability of on-chip data caches under process variations

On-chip caches take a large portion of the chip area. They are much more vulnerable to parameter variation than smaller units. As leakage current becomes a significant component of the total power consumption, the leakage current variations induced thermal and reliability problem to the on-chip caches become an important design concern. This paper studies the impact of process variations, particular the leakage variations, on the temperature and reliability of on-chip caches. Our statistical simulation shows that, under process variation, 85% of the caches see shortened lifetime, with average lifetime being 81.6% of the ideal cache. At runtime, unevenly distributed dynamic power and the corresponding thermal variation would further deteriorate the situation. To mitigate this problem, we propose a dynamic cache subarray permutation scheme that can alleviate the thermal stress on a high-leakage area to improve the reliability of the caches. Experiments on 17 Spec2k benchmarks show that our scheme can extend the cache lifetime by up to 20.3%, and reduce the peak temperature by 7 degrees on average and more on data-intensive applications.

[1]  K. Roy,et al.  Modeling and estimation of failure probability due to parameter variations in nano-scale SRAMs for yield enhancement , 2004, 2004 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.04CH37525).

[2]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[3]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[4]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[5]  Kaushik Roy,et al.  Statistical timing analysis using levelized covariance propagation considering systematic and random variations of process parameters , 2006, TODE.

[6]  Yehea I. Ismail,et al.  Thermal Management of On-Chip Caches Through Power Density Minimization , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  David Blaauw,et al.  Statistical analysis of subthreshold leakage current for VLSI circuits , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.

[9]  S. G. Duvall,et al.  Statistical circuit modeling and optimization , 2000, 2000 5th International Workshop on Statistical Metrology (Cat.No.00TH8489.

[10]  Pradip Bose,et al.  Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Kaushik Roy,et al.  A novel fault tolerant cache to improve yield in nanometer technologies , 2004, Proceedings. 10th IEEE International On-Line Testing Symposium.

[12]  M. Horiguchi,et al.  Redundancy techniques for high-density DRAMs , 1997, 1997 Proceedings Second Annual IEEE International Conference on Innovative Systems in Silicon.

[13]  Kevin Skadron,et al.  A Case for Thermal-Aware Floorplanning at the Microarchitectural Level , 2005, J. Instr. Level Parallelism.

[14]  Jie S. Hu,et al.  Optimizing the thermal behavior of subarrayed data caches , 2005, 2005 International Conference on Computer Design.

[15]  Wei Wu,et al.  Fast thermal simulation for architecture level dynamic thermal management , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[16]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[17]  Narayanan Vijaykrishnan,et al.  ChipPower: an architecture-level leakage simulator , 2004, IEEE International SOC Conference, 2004. Proceedings..

[18]  Mircea R. Stan,et al.  System level leakage reduction considering the interdependence of temperature and leakage , 2004, Proceedings. 41st Design Automation Conference, 2004..

[19]  Ke Meng,et al.  Process Variation Aware Cache Leakage Management , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[20]  Costas J. Spanos,et al.  Modeling within-die spatial correlation effects for process-design co-optimization , 2005, Sixth international symposium on quality electronic design (isqed'05).