Off-policy experience retention for deep actor-critic learning

Tim de Bruin; Jens Kober; K.P. Tuyls; Robert Babuska

Off-policy experience retention for deep actor-critic learning

Tim de Bruin, Jens Kober, K.P. Tuyls, Robert Babuska

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

Abstract

When a limited number of experiences is kept in memory to train a reinforcement learning agent, the criterion that determines which experiences are retained can have a strong impact on the learning performance. In this paper, we argue that for actor critic learning in domains with significant momentum, it is important to retain experiences with off-policy actions when the amount of exploration is reduced over time. This claim is supported by simulation experiments with a pendulum swing-up problem and a magnetic manipulation task. Additionally, we compare our strategy to database overwriting policies based on obtaining experiences spread out over the state-action space, and also to using the temporal difference error as a proxy for the value of experiences.

Original language	English
Title of host publication	Deep Reinforcement Learning Workshop, NIPS 2016 - December 9, 2016
Number of pages	9
Publication status	Published - 2016
Event	NIPS 2016: 30th Conference on Neural Information Processing Systems - Centre Convencions Internacional Barcelona, Barcelona, Spain Duration: 5 Dec 2016 → 10 Dec 2016 https://nips.cc/Conferences/2016

Conference

Conference	NIPS 2016: 30th Conference on Neural Information Processing Systems
Abbreviated title	NIPS
Country/Territory	Spain
City	Barcelona
Period	5/12/16 → 10/12/16
Internet address	https://nips.cc/Conferences/2016

Cite this

@inproceedings{dc71d6f550b2466bba3783f397e7c5a9,

title = "Off-policy experience retention for deep actor-critic learning",

abstract = "When a limited number of experiences is kept in memory to train a reinforcement learning agent, the criterion that determines which experiences are retained can have a strong impact on the learning performance. In this paper, we argue that for actor critic learning in domains with significant momentum, it is important to retain experiences with off-policy actions when the amount of exploration is reduced over time. This claim is supported by simulation experiments with a pendulum swing-up problem and a magnetic manipulation task. Additionally, we compare our strategy to database overwriting policies based on obtaining experiences spread out over the state-action space, and also to using the temporal difference error as a proxy for the value of experiences.",

author = "{de Bruin}, Tim and Jens Kober and K.P. Tuyls and Robert Babuska",

year = "2016",

language = "English",

booktitle = "Deep Reinforcement Learning Workshop, NIPS 2016 - December 9, 2016",

note = "NIPS 2016: 30th Conference on Neural Information Processing Systems, NIPS ; Conference date: 05-12-2016 Through 10-12-2016",

url = "https://nips.cc/Conferences/2016",

}

TY - GEN

T1 - Off-policy experience retention for deep actor-critic learning

AU - de Bruin, Tim

AU - Kober, Jens

AU - Tuyls, K.P.

AU - Babuska, Robert

PY - 2016

Y1 - 2016

N2 - When a limited number of experiences is kept in memory to train a reinforcement learning agent, the criterion that determines which experiences are retained can have a strong impact on the learning performance. In this paper, we argue that for actor critic learning in domains with significant momentum, it is important to retain experiences with off-policy actions when the amount of exploration is reduced over time. This claim is supported by simulation experiments with a pendulum swing-up problem and a magnetic manipulation task. Additionally, we compare our strategy to database overwriting policies based on obtaining experiences spread out over the state-action space, and also to using the temporal difference error as a proxy for the value of experiences.

AB - When a limited number of experiences is kept in memory to train a reinforcement learning agent, the criterion that determines which experiences are retained can have a strong impact on the learning performance. In this paper, we argue that for actor critic learning in domains with significant momentum, it is important to retain experiences with off-policy actions when the amount of exploration is reduced over time. This claim is supported by simulation experiments with a pendulum swing-up problem and a magnetic manipulation task. Additionally, we compare our strategy to database overwriting policies based on obtaining experiences spread out over the state-action space, and also to using the temporal difference error as a proxy for the value of experiences.

UR - https://sites.google.com/site/deeprlnips2016/

M3 - Conference contribution

BT - Deep Reinforcement Learning Workshop, NIPS 2016 - December 9, 2016

T2 - NIPS 2016: 30th Conference on Neural Information Processing Systems

Y2 - 5 December 2016 through 10 December 2016

ER -

Off-policy experience retention for deep actor-critic learning

Abstract

Conference

Other files and links

Fingerprint

Sample effficient deep reinforcement learning for control

Cite this