Bayesian RL in factored POMDPs

Sammie Katt, Frans Oliehoek, Chris Amato

Research output: Contribution to conferenceAbstractScientific

36 Downloads (Pure)

Abstract

Robust decision-making agents in any non-trivial system must reason over uncertainty of various types such as action outcomes, the agent's current state and the dynamics of the environment. The outcome and state un- certainty are elegantly captured by the Partially Observable Markov Decision Processes (POMDP) framework [1], which enable reasoning in stochastic, par- tially observable environments. POMDP solution methods, however, typically assume complete access to the system dynamics, which unfortunately are often not available. When such a model is not available, model-based Bayesian Re- inforcement Learning (BRL) methods explicitly maintain a posterior over the possible models of the environment, and use this knowledge to select actions that, theoretically, trade o_ exploration and exploitation optimally. However, few of the BRL methods are applicable to partial observable settings, and those that are, have limited scaling properties. The Bayes-Adaptive POMDP (BA- POMDP) [4], for example, models the environment in a tabular fashion, which poses a bottleneck for scalability. Here, we describe previous work [3] that pro- poses a method to overcome this bottleneck by representing the dynamics with Bayes Network, an approach that exploits structure in the form of independence between state and observation features.
Original languageEnglish
Pages1-3
Number of pages3
Publication statusPublished - 2019
Event31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019 - Brussels, Belgium
Duration: 6 Nov 20198 Nov 2019

Conference

Conference31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019
Country/TerritoryBelgium
CityBrussels
Period6/11/198/11/19

Fingerprint

Dive into the research topics of 'Bayesian RL in factored POMDPs'. Together they form a unique fingerprint.

Cite this