Abstract
In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these de- pendencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to previously unsolvable problems.
Original language | English |
---|---|
Title of host publication | Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-16 |
Publisher | American Association for Artificial Intelligence (AAAI) |
Pages | 3174-3180 |
Number of pages | 7 |
Publication status | Published - 2016 |
Event | 30th AAAI Conference on Artificial Intelligence - Phoenix, United States Duration: 12 Feb 2016 → 17 Feb 2016 Conference number: 30 |
Publication series
Name | Proceedings of the AAAI |
---|---|
Publisher | Association for the Advancement of Artificial Intelligence. |
ISSN (Print) | 2159-5399 |
ISSN (Electronic) | 2374-3468 |
Conference
Conference | 30th AAAI Conference on Artificial Intelligence |
---|---|
Abbreviated title | AAAI-16 |
Country/Territory | United States |
City | Phoenix |
Period | 12/02/16 → 17/02/16 |
Bibliographical note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-careOtherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
Keywords
- Markov Decision Process
- Transition-independent Multi-agent MDPs
- Reward interactions
- Conditional Return Graphs