Safe Exploration Algorithms for Reinforcement Learning Controllers

Tommaso Mannucci; Erik Jan Van Kampen; Cornelis De Visser; Qiping Chu

doi:10.1109/TNNLS.2017.2654539

Safe Exploration Algorithms for Reinforcement Learning Controllers

Tommaso Mannucci, Erik Jan Van Kampen, Cornelis De Visser, Qiping Chu

Control & Simulation

Research output: Contribution to journal › Article › Scientific › peer-review

69 Citations (Scopus)

40 Downloads (Pure)

Abstract

Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.

Original language	English
Pages (from-to)	1069-1081
Number of pages	13
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	29
Issue number	4
DOIs	https://doi.org/10.1109/TNNLS.2017.2654539
Publication status	Published - 1 Apr 2018

Keywords

Adaptive controllers
model-free control
reinforcement learning (RL)
safe exploration

Access to Document

10.1109/TNNLS.2017.2654539

SherpaFinalSubmissionPaperOnlyAccepted author manuscript, 2.44 MB

Cite this

@article{a38766efe74c4c9181c7a2a13b6ec1a8,

title = "Safe Exploration Algorithms for Reinforcement Learning Controllers",

abstract = "Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.",

keywords = "Adaptive controllers, model-free control, reinforcement learning (RL), safe exploration",

author = "Tommaso Mannucci and {Van Kampen}, {Erik Jan} and {De Visser}, Cornelis and Qiping Chu",

year = "2018",

month = apr,

day = "1",

doi = "10.1109/TNNLS.2017.2654539",

language = "English",

volume = "29",

pages = "1069--1081",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "4",

}

TY - JOUR

T1 - Safe Exploration Algorithms for Reinforcement Learning Controllers

AU - Mannucci, Tommaso

AU - Van Kampen, Erik Jan

AU - De Visser, Cornelis

AU - Chu, Qiping

PY - 2018/4/1

Y1 - 2018/4/1

N2 - Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.

AB - Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.

KW - Adaptive controllers

KW - model-free control

KW - reinforcement learning (RL)

KW - safe exploration

UR - http://www.scopus.com/inward/record.url?scp=85012028615&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2017.2654539

DO - 10.1109/TNNLS.2017.2654539

M3 - Article

AN - SCOPUS:85012028615

SN - 2162-237X

VL - 29

SP - 1069

EP - 1081

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 4

ER -

Safe Exploration Algorithms for Reinforcement Learning Controllers

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this