Efficient exploration with Double Uncertain Value Networks

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientific

60 Downloads (Pure)

Abstract

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.
Original languageEnglish
Title of host publicationDeep Reinforcement Learning Symposium, NIPS 2017
Pages1-17
Number of pages17
Publication statusPublished - 2017
EventNIPS 2017: Thirty-first Conference on Neural Information Processing Systems - Long Beach, United States
Duration: 7 Dec 20177 Dec 2017
Conference number: 31th

Conference

ConferenceNIPS 2017
Country/TerritoryUnited States
CityLong Beach
Period7/12/177/12/17

Fingerprint

Dive into the research topics of 'Efficient exploration with Double Uncertain Value Networks'. Together they form a unique fingerprint.

Cite this