Experience selection in deep reinforcement learning for control

Tim De Bruin; Jens Kober; Karl Tuyls; Robert Babuška

Experience selection in deep reinforcement learning for control

Tim De Bruin, Jens Kober, Karl Tuyls, Robert Babuška

Learning & Autonomous Control

Research output: Contribution to journal › Article › Scientific › peer-review

48 Citations (Scopus)

111 Downloads (Pure)

Abstract

Experience replay is a technique that allows off-policy reinforcement-learning methods to reuse past experiences. The stability and speed of convergence of reinforcement learning, as well as the eventual performance of the learned policy, are strongly dependent on the experiences being replayed. Which experiences are replayed depends on two important choices. The first is which and how many experiences to retain in the experience replay buffer. The second choice is how to sample the experiences that are to be replayed from that buffer. We propose new methods for the combined problem of experience retention and experience sampling. We refer to the combination as experience selection. We focus our investigation specifically on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate different proxies for their immediate and long-term utility. These proxies include age, temporal difference error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy.

Original language	English
Article number	9
Number of pages	56
Journal	Journal of Machine Learning Research
Volume	19
Issue number	9
Publication status	Published - 2018

Keywords

Control
Deep learning
Experience replay
Reinforcement learning
Robotics

Access to Document

17-131Final published version, 2.57 MBLicence: CC BY

https://www.jmlr.org/papers/v19/17-131.html

Cite this

@article{9daa8734df0d420aab11040b3eb5e6a9,

title = "Experience selection in deep reinforcement learning for control",

abstract = "Experience replay is a technique that allows off-policy reinforcement-learning methods to reuse past experiences. The stability and speed of convergence of reinforcement learning, as well as the eventual performance of the learned policy, are strongly dependent on the experiences being replayed. Which experiences are replayed depends on two important choices. The first is which and how many experiences to retain in the experience replay buffer. The second choice is how to sample the experiences that are to be replayed from that buffer. We propose new methods for the combined problem of experience retention and experience sampling. We refer to the combination as experience selection. We focus our investigation specifically on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate different proxies for their immediate and long-term utility. These proxies include age, temporal difference error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy.",

keywords = "Control, Deep learning, Experience replay, Reinforcement learning, Robotics",

author = "{De Bruin}, Tim and Jens Kober and Karl Tuyls and Robert Babu{\v s}ka",

year = "2018",

language = "English",

volume = "19",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

number = "9",

}

TY - JOUR

T1 - Experience selection in deep reinforcement learning for control

AU - De Bruin, Tim

AU - Kober, Jens

AU - Tuyls, Karl

AU - Babuška, Robert

PY - 2018

Y1 - 2018

N2 - Experience replay is a technique that allows off-policy reinforcement-learning methods to reuse past experiences. The stability and speed of convergence of reinforcement learning, as well as the eventual performance of the learned policy, are strongly dependent on the experiences being replayed. Which experiences are replayed depends on two important choices. The first is which and how many experiences to retain in the experience replay buffer. The second choice is how to sample the experiences that are to be replayed from that buffer. We propose new methods for the combined problem of experience retention and experience sampling. We refer to the combination as experience selection. We focus our investigation specifically on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate different proxies for their immediate and long-term utility. These proxies include age, temporal difference error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy.

AB - Experience replay is a technique that allows off-policy reinforcement-learning methods to reuse past experiences. The stability and speed of convergence of reinforcement learning, as well as the eventual performance of the learned policy, are strongly dependent on the experiences being replayed. Which experiences are replayed depends on two important choices. The first is which and how many experiences to retain in the experience replay buffer. The second choice is how to sample the experiences that are to be replayed from that buffer. We propose new methods for the combined problem of experience retention and experience sampling. We refer to the combination as experience selection. We focus our investigation specifically on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate different proxies for their immediate and long-term utility. These proxies include age, temporal difference error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy.

KW - Control

KW - Deep learning

KW - Experience replay

KW - Reinforcement learning

KW - Robotics

UR - http://resolver.tudelft.nl/uuid:9daa8734-df0d-420a-ab11-040b3eb5e6a9

UR - http://www.scopus.com/inward/record.url?scp=85052949234&partnerID=8YFLogxK

M3 - Article

SN - 1532-4435

VL - 19

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

IS - 9

M1 - 9

ER -

Experience selection in deep reinforcement learning for control

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Sample effficient deep reinforcement learning for control

Cite this