TY - GEN
T1 - Structure Learning for Safe Policy Improvement
AU - Simão, Thiago D.
AU - Spaan, Matthijs T.J.
PY - 2019/7
Y1 - 2019/7
N2 - We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.
AB - We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.
U2 - 10.24963/ijcai.2019/479
DO - 10.24963/ijcai.2019/479
M3 - Conference contribution
SP - 3453
EP - 3459
BT - Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
A2 - Kraus, S.
PB - International Joint Conferences on Artifical Intelligence (IJCAI)
T2 - 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Y2 - 10 August 2019 through 16 August 2019
ER -