Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

A. L. Sartor; A. F Lorenzon; Luigi Carro; Fernanda Kastensmidt; S. Wong; Antonio C.S. Beck

doi:10.1145/3001935

Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

A. L. Sartor, A. F Lorenzon, Luigi Carro, Fernanda Kastensmidt, S. Wong, Antonio C.S. Beck

Computer Engineering

Research output: Contribution to journal › Special issue › Scientific › peer-review

11 Citations (Scopus)

Abstract

Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computing. In this scenario, our work proposes three low overhead fault tolerance approaches based on instruction duplication with zero latency detection, which uses a rollback mechanism to correct soft errors in the pipelanes of a configurable VLIW processor. The first uses idle issue slots within a period of time to execute extra instructions considering distinct application phases. The second works at a finer grain, adaptively exploiting idle functional units at run-time. However, some applications present high instruction-level parallelism (ILP), so the ability to provide fault tolerance is reduced: less functional units will be idle, decreasing the number of potential duplicated instructions. The third approach attacks this issue by dynamically reducing ILP according to a configurable threshold, increasing fault tolerance at the cost of performance. While the first two approaches achieve significant fault coverage with minimal area and power overhead for applications with low ILP, the latter improves fault tolerance with low performance degradation. All approaches are evaluated considering area, performance, power dissipation, and error coverage.

Original language	English
Pages (from-to)	13:1-13:21
Number of pages	21
Journal	ACM Journal on Emerging Technologies in Computing Systems
Volume	13
Issue number	2
DOIs	https://doi.org/10.1145/3001935
Publication status	Published - 2017

Bibliographical note

Special Issue on Nanoelectronic Circuit and System Design Methods for the Mobile Computing Era and Regular Papers

Keywords

Fault tolerance
VLIW
soft errors
adaptive processor

Access to Document

10.1145/3001935

Cite this

@article{2c25495342b94cf9a26372ab9e686b32,

title = "Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors",

abstract = "Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computing. In this scenario, our work proposes three low overhead fault tolerance approaches based on instruction duplication with zero latency detection, which uses a rollback mechanism to correct soft errors in the pipelanes of a configurable VLIW processor. The first uses idle issue slots within a period of time to execute extra instructions considering distinct application phases. The second works at a finer grain, adaptively exploiting idle functional units at run-time. However, some applications present high instruction-level parallelism (ILP), so the ability to provide fault tolerance is reduced: less functional units will be idle, decreasing the number of potential duplicated instructions. The third approach attacks this issue by dynamically reducing ILP according to a configurable threshold, increasing fault tolerance at the cost of performance. While the first two approaches achieve significant fault coverage with minimal area and power overhead for applications with low ILP, the latter improves fault tolerance with low performance degradation. All approaches are evaluated considering area, performance, power dissipation, and error coverage.",

keywords = "Fault tolerance, VLIW, soft errors, adaptive processor",

author = "Sartor, {A. L.} and Lorenzon, {A. F} and Luigi Carro and Fernanda Kastensmidt and S. Wong and Beck, {Antonio C.S.}",

note = "Special Issue on Nanoelectronic Circuit and System Design Methods for the Mobile Computing Era and Regular Papers ",

year = "2017",

doi = "10.1145/3001935",

language = "English",

volume = "13",

pages = "13:1--13:21",

journal = "ACM Journal on Emerging Technologies in Computing Systems",

issn = "1550-4832",

publisher = "Association for Computing Machinery (ACM)",

number = "2",

}

TY - JOUR

T1 - Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

AU - Sartor, A. L.

AU - Lorenzon, A. F

AU - Carro, Luigi

AU - Kastensmidt, Fernanda

AU - Wong, S.

AU - Beck, Antonio C.S.

N1 - Special Issue on Nanoelectronic Circuit and System Design Methods for the Mobile Computing Era and Regular Papers

PY - 2017

Y1 - 2017

N2 - Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computing. In this scenario, our work proposes three low overhead fault tolerance approaches based on instruction duplication with zero latency detection, which uses a rollback mechanism to correct soft errors in the pipelanes of a configurable VLIW processor. The first uses idle issue slots within a period of time to execute extra instructions considering distinct application phases. The second works at a finer grain, adaptively exploiting idle functional units at run-time. However, some applications present high instruction-level parallelism (ILP), so the ability to provide fault tolerance is reduced: less functional units will be idle, decreasing the number of potential duplicated instructions. The third approach attacks this issue by dynamically reducing ILP according to a configurable threshold, increasing fault tolerance at the cost of performance. While the first two approaches achieve significant fault coverage with minimal area and power overhead for applications with low ILP, the latter improves fault tolerance with low performance degradation. All approaches are evaluated considering area, performance, power dissipation, and error coverage.

AB - Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computing. In this scenario, our work proposes three low overhead fault tolerance approaches based on instruction duplication with zero latency detection, which uses a rollback mechanism to correct soft errors in the pipelanes of a configurable VLIW processor. The first uses idle issue slots within a period of time to execute extra instructions considering distinct application phases. The second works at a finer grain, adaptively exploiting idle functional units at run-time. However, some applications present high instruction-level parallelism (ILP), so the ability to provide fault tolerance is reduced: less functional units will be idle, decreasing the number of potential duplicated instructions. The third approach attacks this issue by dynamically reducing ILP according to a configurable threshold, increasing fault tolerance at the cost of performance. While the first two approaches achieve significant fault coverage with minimal area and power overhead for applications with low ILP, the latter improves fault tolerance with low performance degradation. All approaches are evaluated considering area, performance, power dissipation, and error coverage.

KW - Fault tolerance

KW - VLIW

KW - soft errors

KW - adaptive processor

U2 - 10.1145/3001935

DO - 10.1145/3001935

M3 - Special issue

SN - 1550-4832

VL - 13

SP - 13:1-13:21

JO - ACM Journal on Emerging Technologies in Computing Systems

JF - ACM Journal on Emerging Technologies in Computing Systems

IS - 2

ER -

Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this