Physically-Adaptive Computing via Introspection and Self-Optimization in Reconfigurable Systems

Digital electronic systems typically must compute precise and deterministic results, but in principle have flexibility in how they compute. Despite the potential flexibility, the overriding paradigm for more than 50 years has been based on fixed, non-adaptive integrated circuits. This one-size-fits-all approach is rapidly losing effectiveness now that technology is advancing into the nanoscale. Physical variation and uncertainty in component behavior are emerging as fundamental constraints and leading to increasingly sub-optimal fault rates, power consumption, chip costs, and lifetimes. This dissertation proposes methods of physically-adaptive computing (PAC), in which reconfigurable electronic systems sense and learn their own physical parameters and adapt with fine granularity in the field, leading to higher reliability and efficiency. We formulate the PAC problem and provide a conceptual framework built around two major themes: introspection and self-optimization. We investigate how systems can efficiently acquire useful information about their physical state and related parameters, and how systems can feasibly re-implement their designs on-the-fly using the information learned. We study the role not only of self-adaptation—where the above two tasks are performed by an adaptive system itself—but also of assisted adaptation using a remote server or peer. We introduce low-cost methods for sensing regional variations in a system, including a flexible, ultra-compact sensor that can be embedded in an application and implemented on field-programmable gate arrays (FPGAs). An array of such sensors, with only 1% total overhead, can be employed to gain useful information about circuit delays, voltage noise, and even leakage variations. We present complementary methods of regional self-optimization, such as finding a design alternative that best fits a given system region. We propose a novel approach to characterizing local, uncorrelated variations. Through in-system emulation of noise, previously hidden variations in transient fault susceptibility are uncovered. Correspondingly, we demonstrate practical methods of self-optimization, such as local re-placement, informed by the introspection data. Forms of physically-adaptive computing are strongly needed in areas such as communications infrastructure, data centers, and space systems. This dissertation contributes practical methods for improving PAC costs and benefits, and promotes a vision of resourceful, dependable digital systems at unimaginably-fine physical scales.

[1]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[2]  John F. Meyer Computation-Based Reliability Analysis , 1976, IEEE Transactions on Computers.

[3]  G.M. Quenot,et al.  A temperature and voltage measurement cell for VLSI circuits , 1991, Euro ASIC '91.

[4]  Douglas W. Clark,et al.  Maximal and Near-Maximal Shift Register Seqyences: Efficient Event Counters and Easy Discrete Logarithms , 1994, IEEE Trans. Computers.

[5]  Andrew M. Tyrrell,et al.  The yield enhancement of field-programmable gate arrays , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[6]  Eduardo I. Boemo,et al.  Thermal monitoring on FPGAs using ring-oscillators , 1997, FPL.

[7]  Michael John Sebastian Smith,et al.  Application-specific integrated circuits , 1997 .

[8]  Jason Cong,et al.  Incremental physical design , 2000, ISPD '00.

[9]  Russell Tessier,et al.  Tolerating operational faults in cluster-based FPGAs , 2000, FPGA '00.

[10]  Gustavo Ribeiro Alves,et al.  DRAFT: an on-line fault detection method for dynamic and partially reconfigurable FPGAs , 2001, Proceedings Seventh International On-Line Testing Workshop.

[11]  Peter Alfke,et al.  Linear Feedback Shift Registers in Virtex Devices , 2001 .

[12]  P. L. Springer Assessing application vulnerability to radiation-induced SEUs in memory , 2001 .

[13]  John M. Emmert,et al.  On-line incremental routing for interconnect fault tolerance in FPGAs minus the router , 2001, Proceedings 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[14]  Charles E. Stroud,et al.  On-line BIST and diagnosis of FPGA interconnect using roving STARs , 2001, Proceedings Seventh International On-Line Testing Workshop.

[15]  Bradley R. Schmerl,et al.  Exploiting architectural design knowledge to support self-repairing systems , 2002, SEKE '02.

[16]  Kaustav Banerjee,et al.  Analysis of IR-drop scaling with implications for deep submicron P/G network designs , 2003, Fourth International Symposium on Quality Electronic Design, 2003. Proceedings..

[17]  N. Seifert,et al.  Timing vulnerability factors of sequentials , 2004, IEEE Transactions on Device and Materials Reliability.

[18]  John Carter,et al.  A lightweight secure cyber foraging infrastructure for resource-constrained devices , 2004, Sixth IEEE Workshop on Mobile Computing Systems and Applications.

[19]  Julie A. McCann,et al.  Evaluation Issues in Autonomic Computing , 2004, GCC Workshops.

[20]  J. Gal-Edd,et al.  L2-James Webb Space Telescope operationally friendly environment? , 2004, 2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No.04TH8720).

[21]  Rami G. Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[22]  Kris Gaj,et al.  An embedded true random number generator for FPGAs , 2004, FPGA '04.

[23]  Eduardo I. Boemo,et al.  Making visible the thermal behaviour of embedded microprocessors on FPGAs: a progress report , 2004, FPGA '04.

[24]  John Wawrzynek,et al.  Defect tolerance in multiple-FPGA systems , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[25]  Yan Lin,et al.  FPGA device and architecture evaluation considering process variations , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[26]  David I. August,et al.  Design and evaluation of hybrid fault-detection systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[27]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[28]  V.V. Zhirnov,et al.  Future devices for information processing , 2005, Proceedings of 35th European Solid-State Device Research Conference, 2005. ESSDERC 2005..

[29]  Kazutoshi Kobayashi,et al.  A yield and speed enhancement scheme under within-die variations on 90nm LUT array , 2005, Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005..

[30]  John P. Hayes,et al.  Transient fault characterization in dynamic noisy environments , 2005, IEEE International Conference on Test, 2005..

[31]  F. Faure,et al.  How to characterize the problem of SEU in processors & representative errors observed on flight , 2005, 11th IEEE International On-Line Testing Symposium.

[32]  G.M. Swift,et al.  Single Event Effects Test Results for Advanced Field Programmable Gate Arrays , 2006, 2006 IEEE Radiation Effects Data Workshop.

[33]  E. Boemo,et al.  A Method for Temperature Measurement on Reconfigurable Systems , 2006 .

[34]  David Blaauw,et al.  ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon , 2006, IEEE Design & Test of Computers.

[35]  Hod Lipson,et al.  Resilient Machines Through Continuous Self-Modeling , 2006, Science.

[36]  Farid N. Najm,et al.  An adaptive FPGA architecture with process variation compensation and reduced leakage , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[37]  N. Vijaykrishnan,et al.  Thermal Characterization and Optimization in Platform FPGAs , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[38]  Narayanan Vijaykrishnan,et al.  Variation aware placement for FPGAs , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[39]  Peter Y. K. Cheung,et al.  Within-die delay variability in 90nm FPGAs and beyond , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[40]  Jinjun Xiong,et al.  FPGA Performance Optimization Via Chipwise Placement Considering Process Variations , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[41]  A.D. George,et al.  Hardware/software interface for high-performance space computing with FPGA coprocessors , 2006, 2006 IEEE Aerospace Conference.

[42]  Kai Zhu Post-route LUT output polarity selection for timing optimization , 2007, FPGA '07.

[43]  John W. Lockwood,et al.  Adaptive Thermoregulation for Applications on Reconfigurable Devices , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[44]  Poki Chen,et al.  A Fully Digital Time-Domain Smart Temperature Sensor Realized With 140 FPGA Logic Elements , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[45]  T. Heijmen,et al.  A Comprehensive Study on the Soft-Error Rate of Flip-Flops From 90-nm Production Libraries , 2007, IEEE Transactions on Device and Materials Reliability.

[46]  Tian Xia,et al.  High-precision delay testing of Virtex-4 FPGA designs , 2007, 2007 50th Midwest Symposium on Circuits and Systems.

[47]  V. Izzo,et al.  FPGA implementation of a high-resolution time-to-digital converter , 2007, 2007 IEEE Nuclear Science Symposium Conference Record.

[48]  Peter Y. K. Cheung,et al.  Self-characterization of Combinatorial Circuit Delays in FPGAs , 2007, 2007 International Conference on Field-Programmable Technology.

[49]  L. Massengill,et al.  Effects of Random Dopant Fluctuations (RDF) on the Single Event Vulnerability of 90 and 65 nm CMOS Technologies , 2007, IEEE Transactions on Nuclear Science.

[50]  Peter Y. K. Cheung,et al.  Parametric yield in FPGAs due to within-die delay variations: a quantitative analysis , 2007, FPGA '07.

[51]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[52]  Xiaodong Li,et al.  Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[53]  David J. Lilja,et al.  Improving nanoelectronic designs using a statistical approach to identify key parameters in circuit level SEU simulations , 2007, 2007 IEEE International Symposium on Nanoscale Architectures.

[54]  G Allen,et al.  Assessing and mitigating radiation effects in Xilinx SRAM FPGAs , 2008, 2008 European Conference on Radiation and Its Effects on Components and Systems.

[55]  Bronis R. de Supinski,et al.  Soft error vulnerability of iterative linear algebra methods , 2007, ICS '08.

[56]  Frank Vahid,et al.  Warp Processing: Dynamic Translation of Binaries to FPGA Circuits , 2008, Computer.

[57]  Shubu Mukherjee,et al.  Architecture Design for Soft Errors , 2008 .

[58]  Sandip Kundu,et al.  A Built-In Self-Test Scheme for Soft Error Rate Characterization , 2008, 2008 14th IEEE International On-Line Testing Symposium.

[59]  Hanpei Koike,et al.  Suppression of Intrinsic Delay Variation in FPGAs using Multiple Configurations , 2008, TRETS.

[60]  Seda Ogrenci Memik,et al.  Optimizing Thermal Sensor Allocation for Microprocessors , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[61]  Scott A. Mahlke,et al.  Reliable Systems on Unreliable Fabrics , 2008, IEEE Design & Test of Computers.

[62]  Chen-Yong Cher,et al.  Variation-aware thermal characterization and management of multi-core architectures , 2008, 2008 IEEE International Conference on Computer Design.

[63]  John P. Hayes,et al.  High-level vulnerability over space and time to insidious soft errors , 2008, 2008 IEEE International High Level Design Validation and Test Workshop.

[64]  Narayanan Vijaykrishnan,et al.  Toward Increasing FPGA Lifetime , 2008, IEEE Transactions on Dependable and Secure Computing.

[65]  D. Brannon,et al.  Exploiting Lunar Natural and Augmented Thermal Environments for Exploration and Research , 2008 .

[66]  Gary Swift,et al.  VIRTEX-4 VQ static SEU Characterization Summary , 2008 .

[67]  P. Adell Assessing and Mitigating Radiation Effects in Xilinx FPGAs , 2008 .

[68]  Tino Heijmen Soft-Error Vulnerability of Sub-100-nm Flip-Flops , 2008, 2008 14th IEEE International On-Line Testing Symposium.

[69]  Narayanan Vijaykrishnan,et al.  Thermal-aware reliability analysis for Platform FPGAs , 2008, 2008 IEEE/ACM International Conference on Computer-Aided Design.

[70]  Dinesh C. Verma,et al.  A sensor placement algorithm for redundant covering based on Riesz energy minimization , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[71]  Arijit Biswas,et al.  Computing Accurate AVFs using ACE Analysis on Performance Models: A Rebuttal , 2008, IEEE Computer Architecture Letters.

[72]  I.A. Troxel,et al.  Achieving Multipurpose Space Imaging with the ARTEMIS Reconfigurable Payload Processor , 2008, 2008 IEEE Aerospace Conference.

[73]  Matthew French,et al.  Autonomous System on a Chip Adaptation through Partial Runtime Reconfiguration , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[74]  Wolfgang Rosenstiel,et al.  Current state of ASoC design methodology , 2008, Organic Computing - Controlled Self-organization.

[75]  André DeHon,et al.  VMATCH: Using logical variation to counteract physical variation in bottom-up, nanoscale systems , 2009, 2009 International Conference on Field-Programmable Technology.

[76]  Raphael Rubin,et al.  Choose-your-own-adventure routing: lightweight load-time defect avoidance , 2009, FPGA '09.

[77]  Michael J. Wirthlin,et al.  On-Orbit Flight Results from the Reconfigurable Cibola Flight Experiment Satellite (CFESat) , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[78]  N. Seifert,et al.  Comparison of alpha-particle and neutron-induced combinational and sequential logic error rates at the 32nm technology node , 2009, 2009 IEEE International Reliability Physics Symposium.

[79]  Paul Ampadu,et al.  A Sensor to Detect Normal or Reverse Temperature Dependence in Nanoscale CMOS Circuits , 2009, 2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[80]  Manish Parashar,et al.  Enabling autonomic power-aware management of instrumented data centers , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[81]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[82]  Peter Y. K. Cheung,et al.  Compensating for variability in FPGAs by re-mapping and re-placement , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[83]  Li Shang,et al.  Process variation characterization of chip-level multiprocessors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[84]  F.C. Sabou,et al.  Markov Chain Analysis of Thermally Induced Soft Errors in Subthreshold Nanoscale CMOS Circuits , 2009, IEEE Transactions on Device and Materials Reliability.

[85]  John P. Hayes,et al.  On-line characterization and reconfiguration for single event upset variations , 2009, 2009 15th IEEE International On-Line Testing Symposium.

[86]  Shekhar Y. Borkar,et al.  Design perspectives on 22nm CMOS and beyond , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[87]  Tao Li,et al.  Soft error vulnerability aware process variation mitigation , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[88]  Joseph L. Hellerstein Engineering autonomic systems , 2009, ICAC '09.

[89]  Arijit Biswas,et al.  APast Future Time Quantized AVF : A Means of Capturing Vulnerability Variations over Small Windows of Time , 2009 .

[90]  Jason Helge Anderson,et al.  Packing Techniques for Virtex-5 FPGAs , 2009, TRETS.

[91]  M. Elmasry,et al.  Comparative analysis of process variation impact on flip-flops soft error rate , 2009, 2009 1st Asia Symposium on Quality Electronic Design.

[92]  David M. Lewis,et al.  Architectural enhancements in Stratix-III™ and Stratix-IV™ , 2009, FPGA '09.

[93]  Saurabh Dighe,et al.  Within-die variation-aware dynamic-voltage-frequency scaling core mapping and thread hopping for an 80-core processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[94]  John P. Hayes,et al.  Toward Physically-Adaptive Computing , 2010, 2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems.

[95]  Frank Vahid,et al.  Server-side coprocessor updating for mobile devices with FPGAs , 2010, FPGA '10.

[96]  Mohab Anis,et al.  FPGA Design for Timing Yield Under Process Variations , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[97]  Yutao He,et al.  iBoard: A highly-capable, high-performance, reconfigurable FPGA-based building block for flight instrument digital electronics , 2010, 2010 NASA/ESA Conference on Adaptive Hardware and Systems.

[98]  Yung-Hsiang Lu,et al.  Cloud Computing for Mobile Users: Can Offloading Computation Save Energy? , 2010, Computer.

[99]  Poki Chen,et al.  FPGA Vernier Digital-to-Time Converter With 1.58 ps Resolution and 59.3 Minutes Operation Range , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[100]  Henry Hoffmann,et al.  Enabling technologies for self-aware adaptive systems , 2010, 2010 NASA/ESA Conference on Adaptive Hardware and Systems.

[101]  Roy Sterritt,et al.  Fulfilling the Vision of Autonomic Computing , 2010, Computer.

[102]  Tom Flatley Advanced Hybrid On-Board Science Data Processor - SpaceCube 2.0 , 2010 .

[103]  John P. Hayes,et al.  Self-Test and Adaptation for Random Variations in Reliability , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[104]  Peter Y. K. Cheung,et al.  Degradation in FPGAs: measurement and modelling , 2010, FPGA '10.

[105]  John P. Hayes,et al.  On-line sensing for healthier FPGA systems , 2010, FPGA '10.