Documents

  • abstract39

    Final published version, 378 KB, PDF document

Links

Robust decision-making agents in any non-trivial system must reason over uncertainty of various types such as action outcomes, the agent's current state and the dynamics of the environment. The outcome and state un- certainty are elegantly captured by the Partially Observable Markov Decision Processes (POMDP) framework [1], which enable reasoning in stochastic, par- tially observable environments. POMDP solution methods, however, typically assume complete access to the system dynamics, which unfortunately are often not available. When such a model is not available, model-based Bayesian Re- inforcement Learning (BRL) methods explicitly maintain a posterior over the possible models of the environment, and use this knowledge to select actions that, theoretically, trade o_ exploration and exploitation optimally. However, few of the BRL methods are applicable to partial observable settings, and those that are, have limited scaling properties. The Bayes-Adaptive POMDP (BA- POMDP) [4], for example, models the environment in a tabular fashion, which poses a bottleneck for scalability. Here, we describe previous work [3] that pro- poses a method to overcome this bottleneck by representing the dynamics with Bayes Network, an approach that exploits structure in the form of independence between state and observation features.
Original languageEnglish
Pages1-3
Number of pages3
Publication statusPublished - 2019
Event31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019 - Brussels, Belgium
Duration: 6 Nov 20198 Nov 2019

Conference

Conference31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019
CountryBelgium
CityBrussels
Period6/11/198/11/19

ID: 68632142