Reliability Aware Computing Platforms Design and Lifetime Management

Research output: ThesisDissertation (TU Delft)

199 Downloads (Pure)

Abstract

Aggressive CMOS technology feature size down-scaling into the deca nanometer regime, while benefiting performance and yield, determined device characteristics variability increase w.r.t. their nominal values, which can lead to large spreads in delay, power, and robustness, and make devices more prone to aging and noise induced failures during in-field usage. Because of transistor’s gate dielectric increasing power density and electric field the nanoscale Integrated Circuits (ICs) failure mechanisms accelerating factors have become more severe than ever, which can cause higher failure rate during ICs useful life and early aging onset. As a result, meeting the reliability targets with viable costs in this landscape becomes a significant challenge, requiring to be addressed in an unitary manner from design time to run time. To this end, we propose a holistic reliability aware design and lifetime management framework concerned (i) at design time, with providing a reliability enhanced adaptive architecture fabric, and (ii) at run time, with observing and dynamically managing fabric’s wear-out profile such that user defined Quality-of-Service requirements are fulfilled, and with maintaining a full-life reliability log to be utilized as auxiliary information during the next IC generation design. Specifically, we first introduce design time transistor and circuit level aging models, which provide the foundation for a 4-dimensional Design Space Exploration (DSE) meant to identify a reliability optimized circuit realization compliant with area, power, and delay constraints. Subsequently, to enable the creation of a low cost but yet accurate fabric observation infrastructure, we propose a methodology to minimize the number of aging sensors to be deployed in a circuit and identify their location, and introduce a sensor design able to directly capture circuit level amalgamated effects of concomitant degradation mechanisms. Furthermore, to make the information collected from sensors meaningful to the run-time management framework we introduce a circuit level model that can estimate the overall circuit aging and predict its End-of-Life based on imprecise sensors measurements, while taking into account the degradation nonlinearities. Finally, to provide more DSE reliability enhancement options we focus on the realization of reliable data transport and processing with unreliable components, and propose: (i) a codec for reliable energy efficient medium/long range data transport, and (ii) a methodology to obtain Error Correction Codes protected data processing units with an output error rate smaller than the fabrication technology gate error rate.
Original languageEnglish
Awarding Institution
  • Delft University of Technology
Supervisors/Advisors
  • Bertels, Koen, Supervisor
  • Cotofana, S.D., Advisor
Award date26 Jan 2017
Print ISBNs978-94-6186-780-3
DOIs
Publication statusPublished - 2017

Keywords

  • Reliability
  • Reliability Aware Computation
  • Dynamic Lifetime Reliability Management
  • Reliability Assessment

Fingerprint

Dive into the research topics of 'Reliability Aware Computing Platforms Design and Lifetime Management'. Together they form a unique fingerprint.

Cite this