Blind Identification of Thermal Models and Power Sources From Thermal Measurements

The ability to sense the temperatures and power consumption of various key components of a chip is central to the operation of modern integrated circuits, such as processors. While modern chips often include a number of embedded thermal sensors, they lack the ability to sense power at fine granularity. This paper proposes a new direction to simultaneously identify the thermal models and the fine-grain power consumption of a chip from just the measurements of the thermal sensors and the total power consumption. Our identification technique is blind as it does not require design knowledge of the thermal-power model to identify the power sources. We investigate the main challenges in blind identification, which are the permutation and scaling ambiguities, and propose novel techniques to resolve these ambiguities. We implement our technique and apply it in three contexts. First, we implement it within a controlled simulation environment, which enables us to verify its accuracy and analyze its sensitivity to relevant issues, such as measurement noise and number of available training samples. Second, we apply it on a real multi-core CPU + GPU processor-based system, where we show the ability to identify the runtime power consumption of the individual cores using just the total power measurement and the measurements of the embedded thermal sensors under different workloads. Third, we apply it for non-invasive power sensing of chips by inverting the temperatures measured using an external infrared imaging camera. We show that our technique consistently improves the modeling and sensing accuracy of integrated circuits.

[1]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[2]  M. Maeda,et al.  [Heat conduction]. , 1972, Kango kyoshitsu. [Nursing classroom].

[3]  Sherief Reda,et al.  Blind identification of power sources in processors , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[4]  Joseph Shor,et al.  A Fully Integrated Multi-CPU, Processor Graphics, and Memory Controller 32-nm Processor , 2012, IEEE Journal of Solid-State Circuits.

[5]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Sherief Reda,et al.  Power mapping and modeling of multi-core processors , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[7]  Tajana Simunic,et al.  Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Luca Benini,et al.  A distributed and self-calibrating model-predictive controller for energy and thermal management of high-performance multicores , 2011, 2011 Design, Automation & Test in Europe.

[9]  Sheldon X.-D. Tan,et al.  Parameterized architecture-level dynamic thermal models for multicore microprocessors , 2010, TODE.

[10]  Sherief Reda,et al.  Consistent runtime thermal prediction and control through workload phase detection , 2010, Design Automation Conference.

[11]  Luca Benini,et al.  Static Thermal Model Learning for High-Performance Multicore Servers , 2011, 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN).

[12]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[13]  Tajana Simunic,et al.  Accurate Temperature Estimation for Efficient Thermal Management , 2008, 9th International Symposium on Quality Electronic Design (isqed 2008).

[14]  Sherief Reda,et al.  Scheduling challenges and opportunities in integrated CPU+GPU processors , 2016, 2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia).

[15]  Sherief Reda,et al.  Post-silicon power characterization using thermal infrared emissions , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[16]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[17]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[18]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[19]  Li Shang,et al.  ISAC: Integrated Space-and-Time-Adaptive Chip-Package Thermal Analysis , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Kai Ma,et al.  Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[21]  Luca Benini,et al.  An Effective Gray-Box Identification Procedure for Multicore Thermal Modeling , 2014, IEEE Transactions on Computers.

[22]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).