Standard

Reliability Aware Computing Platforms Design and Lifetime Management. / Cucu Laurenciu, Nicoleta.

2017. 132 p.

Research output: ThesisDissertation (TU Delft)Scientific

Harvard

APA

Vancouver

Author

BibTeX

@phdthesis{9c46dcd4e68e42dea7ab676a3e631f0d,
title = "Reliability Aware Computing Platforms Design and Lifetime Management",
abstract = "Aggressive CMOS technology feature size down-scaling into the deca nanometer regime, while benefiting performance and yield, determined device characteristics variability increase w.r.t. their nominal values, which can lead to large spreads in delay, power, and robustness, and make devices more prone to aging and noise induced failures during in-field usage. Because of transistor’s gate dielectric increasing power density and electric field the nanoscale Integrated Circuits (ICs) failure mechanisms accelerating factors have become more severe than ever, which can cause higher failure rate during ICs useful life and early aging onset. As a result, meeting the reliability targets with viable costs in this landscape becomes a significant challenge, requiring to be addressed in an unitary manner from design time to run time. To this end, we propose a holistic reliability aware design and lifetime management framework concerned (i) at design time, with providing a reliability enhanced adaptive architecture fabric, and (ii) at run time, with observing and dynamically managing fabric’s wear-out profile such that user defined Quality-of-Service requirements are fulfilled, and with maintaining a full-life reliability log to be utilized as auxiliary information during the next IC generation design. Specifically, we first introduce design time transistor and circuit level aging models, which provide the foundation for a 4-dimensional Design Space Exploration (DSE) meant to identify a reliability optimized circuit realization compliant with area, power, and delay constraints. Subsequently, to enable the creation of a low cost but yet accurate fabric observation infrastructure, we propose a methodology to minimize the number of aging sensors to be deployed in a circuit and identify their location, and introduce a sensor design able to directly capture circuit level amalgamated effects of concomitant degradation mechanisms. Furthermore, to make the information collected from sensors meaningful to the run-time management framework we introduce a circuit level model that can estimate the overall circuit aging and predict its End-of-Life based on imprecise sensors measurements, while taking into account the degradation nonlinearities. Finally, to provide more DSE reliability enhancement options we focus on the realization of reliable data transport and processing with unreliable components, and propose: (i) a codec for reliable energy efficient medium/long range data transport, and (ii) a methodology to obtain Error Correction Codes protected data processing units with an output error rate smaller than the fabrication technology gate error rate.",
keywords = "Reliability, Reliability Aware Computation, Dynamic Lifetime Reliability Management, Reliability Assessment",
author = "{Cucu Laurenciu}, Nicoleta",
year = "2017",
doi = "10.4233/uuid:9c46dcd4-e68e-42de-a7ab-676a3e631f0d",
language = "English",
isbn = "978-94-6186-780-3",

}

RIS

TY - THES

T1 - Reliability Aware Computing Platforms Design and Lifetime Management

AU - Cucu Laurenciu, Nicoleta

PY - 2017

Y1 - 2017

N2 - Aggressive CMOS technology feature size down-scaling into the deca nanometer regime, while benefiting performance and yield, determined device characteristics variability increase w.r.t. their nominal values, which can lead to large spreads in delay, power, and robustness, and make devices more prone to aging and noise induced failures during in-field usage. Because of transistor’s gate dielectric increasing power density and electric field the nanoscale Integrated Circuits (ICs) failure mechanisms accelerating factors have become more severe than ever, which can cause higher failure rate during ICs useful life and early aging onset. As a result, meeting the reliability targets with viable costs in this landscape becomes a significant challenge, requiring to be addressed in an unitary manner from design time to run time. To this end, we propose a holistic reliability aware design and lifetime management framework concerned (i) at design time, with providing a reliability enhanced adaptive architecture fabric, and (ii) at run time, with observing and dynamically managing fabric’s wear-out profile such that user defined Quality-of-Service requirements are fulfilled, and with maintaining a full-life reliability log to be utilized as auxiliary information during the next IC generation design. Specifically, we first introduce design time transistor and circuit level aging models, which provide the foundation for a 4-dimensional Design Space Exploration (DSE) meant to identify a reliability optimized circuit realization compliant with area, power, and delay constraints. Subsequently, to enable the creation of a low cost but yet accurate fabric observation infrastructure, we propose a methodology to minimize the number of aging sensors to be deployed in a circuit and identify their location, and introduce a sensor design able to directly capture circuit level amalgamated effects of concomitant degradation mechanisms. Furthermore, to make the information collected from sensors meaningful to the run-time management framework we introduce a circuit level model that can estimate the overall circuit aging and predict its End-of-Life based on imprecise sensors measurements, while taking into account the degradation nonlinearities. Finally, to provide more DSE reliability enhancement options we focus on the realization of reliable data transport and processing with unreliable components, and propose: (i) a codec for reliable energy efficient medium/long range data transport, and (ii) a methodology to obtain Error Correction Codes protected data processing units with an output error rate smaller than the fabrication technology gate error rate.

AB - Aggressive CMOS technology feature size down-scaling into the deca nanometer regime, while benefiting performance and yield, determined device characteristics variability increase w.r.t. their nominal values, which can lead to large spreads in delay, power, and robustness, and make devices more prone to aging and noise induced failures during in-field usage. Because of transistor’s gate dielectric increasing power density and electric field the nanoscale Integrated Circuits (ICs) failure mechanisms accelerating factors have become more severe than ever, which can cause higher failure rate during ICs useful life and early aging onset. As a result, meeting the reliability targets with viable costs in this landscape becomes a significant challenge, requiring to be addressed in an unitary manner from design time to run time. To this end, we propose a holistic reliability aware design and lifetime management framework concerned (i) at design time, with providing a reliability enhanced adaptive architecture fabric, and (ii) at run time, with observing and dynamically managing fabric’s wear-out profile such that user defined Quality-of-Service requirements are fulfilled, and with maintaining a full-life reliability log to be utilized as auxiliary information during the next IC generation design. Specifically, we first introduce design time transistor and circuit level aging models, which provide the foundation for a 4-dimensional Design Space Exploration (DSE) meant to identify a reliability optimized circuit realization compliant with area, power, and delay constraints. Subsequently, to enable the creation of a low cost but yet accurate fabric observation infrastructure, we propose a methodology to minimize the number of aging sensors to be deployed in a circuit and identify their location, and introduce a sensor design able to directly capture circuit level amalgamated effects of concomitant degradation mechanisms. Furthermore, to make the information collected from sensors meaningful to the run-time management framework we introduce a circuit level model that can estimate the overall circuit aging and predict its End-of-Life based on imprecise sensors measurements, while taking into account the degradation nonlinearities. Finally, to provide more DSE reliability enhancement options we focus on the realization of reliable data transport and processing with unreliable components, and propose: (i) a codec for reliable energy efficient medium/long range data transport, and (ii) a methodology to obtain Error Correction Codes protected data processing units with an output error rate smaller than the fabrication technology gate error rate.

KW - Reliability

KW - Reliability Aware Computation

KW - Dynamic Lifetime Reliability Management

KW - Reliability Assessment

UR - http://resolver.tudelft.nl/uuid:9c46dcd4-e68e-42de-a7ab-676a3e631f0d

U2 - 10.4233/uuid:9c46dcd4-e68e-42de-a7ab-676a3e631f0d

DO - 10.4233/uuid:9c46dcd4-e68e-42de-a7ab-676a3e631f0d

M3 - Dissertation (TU Delft)

SN - 978-94-6186-780-3

ER -

ID: 11204835