Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study.
Original languageEnglish
Title of host publicationProceedings of AAMAS'20
EditorsBo An, Neil Yorke-Smith, Amal El Fallah Seghrouchni, Gita Sukthankar
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages913-921
Number of pages9
ISBN (Electronic)978-1-4503-7518-4
Publication statusPublished - May 2020
EventAAMAS 2020: The 19th International Conference on Autonomous Agents and Multi-Agent Systems - Auckland, New Zealand
Duration: 9 May 202013 May 2020
Conference number: 19th
https://aamas2020.conference.auckland.ac.nz

Conference

ConferenceAAMAS 2020
CountryNew Zealand
CityAuckland
Period9/05/2013/05/20
OtherVirtual/online event due to COVID-19
Internet address

ID: 72802500