Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy pi is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy pi'. However, the policy computed by traditional RL algorithms might have worse performance compared to pi. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of pi' is better than the performance of pi given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.
Original languageEnglish
Title of host publicationProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
EditorsS. Kraus
PublisherInternational Joint Conferences on Artifical Intelligence (IJCAI)
ISBN (Electronic)978-0-9992411-4-1
Publication statusPublished - 2019
EventIJCAI 2019: 28th International Joint Conference on Artificial Intelligence - Macao, China
Duration: 10 Aug 201916 Aug 2019


ConferenceIJCAI 2019

ID: 56745666