Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

Thiago D. Simão

doi:10.24963/ijcai.2019/919

Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

Algorithmics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

Abstract

Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy π is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy π ⁰. However, the policy computed by traditional RL algorithms might have worse performance compared to π. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of π ⁰ is better than the performance of π given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

Original language	English
Title of host publication	Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Editors	Sarit Kraus
Publisher	International Joint Conferences on Artifical Intelligence (IJCAI)
Pages	6460-6461
Number of pages	2
ISBN (Electronic)	978-0-9992411-4-1
DOIs	https://doi.org/10.24963/ijcai.2019/919
Publication status	Published - 2019
Event	IJCAI 2019: 28th International Joint Conference on Artificial Intelligence - Macao, China Duration: 10 Aug 2019 → 16 Aug 2019

Publication series

Name	IJCAI International Joint Conference on Artificial Intelligence
Volume	2019-August
ISSN (Print)	1045-0823

Conference

Conference	IJCAI 2019
Country/Territory	China
City	Macao
Period	10/08/19 → 16/08/19

Access to Document

10.24963/ijcai.2019/919

Cite this

Simão, T. D. (2019). Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments. In S. Kraus (Ed.), Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019 (pp. 6460-6461). (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2019-August). International Joint Conferences on Artifical Intelligence (IJCAI). https://doi.org/10.24963/ijcai.2019/919

Simão, Thiago D. / Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments. Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019. editor / Sarit Kraus. International Joint Conferences on Artifical Intelligence (IJCAI), 2019. pp. 6460-6461 (IJCAI International Joint Conference on Artificial Intelligence).

@inproceedings{e3f8e2d9dbd24cc194c74698948ac853,

title = "Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments",

abstract = "Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy π is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy π 0. However, the policy computed by traditional RL algorithms might have worse performance compared to π. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of π 0 is better than the performance of π given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method. ",

author = "Sim{\~a}o, {Thiago D.}",

year = "2019",

doi = "10.24963/ijcai.2019/919",

language = "English",

series = "IJCAI International Joint Conference on Artificial Intelligence",

publisher = "International Joint Conferences on Artifical Intelligence (IJCAI)",

pages = "6460--6461",

editor = "Sarit Kraus",

booktitle = "Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019",

note = "IJCAI 2019 : 28th International Joint Conference on Artificial Intelligence ; Conference date: 10-08-2019 Through 16-08-2019",

}

Simão, TD 2019, Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments. in S Kraus (ed.), Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019. IJCAI International Joint Conference on Artificial Intelligence, vol. 2019-August, International Joint Conferences on Artifical Intelligence (IJCAI), pp. 6460-6461, IJCAI 2019, Macao, China, 10/08/19. https://doi.org/10.24963/ijcai.2019/919

Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments. / Simão, Thiago D.
Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019. ed. / Sarit Kraus. International Joint Conferences on Artifical Intelligence (IJCAI), 2019. p. 6460-6461 (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2019-August).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

AU - Simão, Thiago D.

PY - 2019

Y1 - 2019

N2 - Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy π is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy π 0. However, the policy computed by traditional RL algorithms might have worse performance compared to π. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of π 0 is better than the performance of π given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

AB - Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy π is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy π 0. However, the policy computed by traditional RL algorithms might have worse performance compared to π. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of π 0 is better than the performance of π given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

UR - http://www.scopus.com/inward/record.url?scp=85074946278&partnerID=8YFLogxK

U2 - 10.24963/ijcai.2019/919

DO - 10.24963/ijcai.2019/919

M3 - Conference contribution

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 6460

EP - 6461

BT - Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019

A2 - Kraus, Sarit

PB - International Joint Conferences on Artifical Intelligence (IJCAI)

T2 - IJCAI 2019

Y2 - 10 August 2019 through 16 August 2019

ER -

Simão TD. Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments. In Kraus S, editor, Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019. International Joint Conferences on Artifical Intelligence (IJCAI). 2019. p. 6460-6461. (IJCAI International Joint Conference on Artificial Intelligence). doi: 10.24963/ijcai.2019/919

Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Safe Online and Offline Reinforcement Learning

Safe Policy Improvement with an Estimated Baseline Policy

Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

Cite this