Structure Learning for Safe Policy Improvement

Thiago D. Simão; Matthijs T.J. Spaan

doi:10.24963/ijcai.2019/479

Structure Learning for Safe Policy Improvement

Algorithmics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

8 Citations (Scopus)

Abstract

We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.

Original language	English
Title of host publication	Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Editors	S. Kraus
Publisher	International Joint Conferences on Artifical Intelligence (IJCAI)
Pages	3453-3459
Number of pages	7
ISBN (Electronic)	978-0-9992411-4-1
DOIs	https://doi.org/10.24963/ijcai.2019/479
Publication status	Published - Jul 2019
Event	28th International Joint Conference on Artificial Intelligence, IJCAI 2019 - Macao, China Duration: 10 Aug 2019 → 16 Aug 2019

Conference

Conference	28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Country/Territory	China
City	Macao
Period	10/08/19 → 16/08/19

Access to Document

10.24963/ijcai.2019/479

8 Citations
2 Conference contribution
1 Dissertation (TU Delft)

Safe Online and Offline Reinforcement Learning
Simão, T. D., 2023, 128 p.
Research output: Thesis › Dissertation (TU Delft)

Open Access
File
202 Downloads (Pure)
Safe Policy Improvement with an Estimated Baseline Policy
Simão, T. D., Laroche, R. & Tachet des Combes, R., 2020, Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC, p. 1269–1277 9 p. (AAMAS '20).
Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

Open Access
File
Safe Policy Improvement with Baseline Bootstrapping in Factored Environments
Simão, T. D. & Spaan, M. T. J., 2019, 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019. American Association for Artificial Intelligence (AAAI), p. 4967-4974 8 p. (33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019).
Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review
19 Citations (Scopus)

Cite this

@inproceedings{fc2201e301f7462385a4e2634c0c172c,

title = "Structure Learning for Safe Policy Improvement",

abstract = "We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables. ",

author = "Sim{\~a}o, {Thiago D.} and Spaan, {Matthijs T.J.}",

year = "2019",

month = jul,

doi = "10.24963/ijcai.2019/479",

language = "English",

pages = "3453--3459",

editor = "S. Kraus",

booktitle = "Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence",

publisher = "International Joint Conferences on Artifical Intelligence (IJCAI)",

note = "28th International Joint Conference on Artificial Intelligence, IJCAI 2019 ; Conference date: 10-08-2019 Through 16-08-2019",

}

Simão, TD & Spaan, MTJ 2019, Structure Learning for Safe Policy Improvement. in S Kraus (ed.), Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artifical Intelligence (IJCAI), pp. 3453-3459, 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10/08/19. https://doi.org/10.24963/ijcai.2019/479

Structure Learning for Safe Policy Improvement. / Simão, Thiago D.; Spaan, Matthijs T.J.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. ed. / S. Kraus. International Joint Conferences on Artifical Intelligence (IJCAI), 2019. p. 3453-3459.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Structure Learning for Safe Policy Improvement

AU - Simão, Thiago D.

AU - Spaan, Matthijs T.J.

PY - 2019/7

Y1 - 2019/7

N2 - We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.

AB - We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.

U2 - 10.24963/ijcai.2019/479

DO - 10.24963/ijcai.2019/479

M3 - Conference contribution

SP - 3453

EP - 3459

BT - Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

A2 - Kraus, S.

PB - International Joint Conferences on Artifical Intelligence (IJCAI)

T2 - 28th International Joint Conference on Artificial Intelligence, IJCAI 2019

Y2 - 10 August 2019 through 16 August 2019

ER -

Structure Learning for Safe Policy Improvement

Abstract

Conference

Access to Document

Fingerprint

Research output

Safe Online and Offline Reinforcement Learning

Safe Policy Improvement with an Estimated Baseline Policy

Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

Cite this