Solving Transition-Independent Multi-agent MDPs with Sparse Interactions

Joris Scharpff; Diederik M. Roijers; Frans A. Oliehoek; Matthijs T. J. Spaan; M.M. de Weerdt

Solving Transition-Independent Multi-agent MDPs with Sparse Interactions

Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T. J. Spaan, M.M. de Weerdt

Algorithmics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

18 Citations (Scopus)

21 Downloads (Pure)

Abstract

In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these de- pendencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to previously unsolvable problems.

Original language	English
Title of host publication	Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-16
Publisher	American Association for Artificial Intelligence (AAAI)
Pages	3174-3180
Number of pages	7
Publication status	Published - 2016
Event	30th AAAI Conference on Artificial Intelligence - Phoenix, United States Duration: 12 Feb 2016 → 17 Feb 2016 Conference number: 30

Publication series

Name	Proceedings of the AAAI
Publisher	Association for the Advancement of Artificial Intelligence.
ISSN (Print)	2159-5399
ISSN (Electronic)	2374-3468

Conference

Conference	30th AAAI Conference on Artificial Intelligence
Abbreviated title	AAAI-16
Country/Territory	United States
City	Phoenix
Period	12/02/16 → 17/02/16

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Markov Decision Process
Transition-independent Multi-agent MDPs
Reward interactions
Conditional Return Graphs

Access to Document

10405_Article_Text_13933_1_2_20201228Final published version, 851 KB

https://ojs.aaai.org/index.php/AAAI/article/view/10405Licence: Other

Cite this

Scharpff, J., Roijers, D. M., Oliehoek, F. A., Spaan, M. T. J., & de Weerdt, M. M. (2016). Solving Transition-Independent Multi-agent MDPs with Sparse Interactions. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-16 (pp. 3174-3180). (Proceedings of the AAAI). American Association for Artificial Intelligence (AAAI). https://ojs.aaai.org/index.php/AAAI/article/view/10405

@inproceedings{a9a91806a8f24c8a833fbe8bcefbccbb,

title = "Solving Transition-Independent Multi-agent MDPs with Sparse Interactions",

abstract = "In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these de- pendencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to previously unsolvable problems.",

keywords = "Markov Decision Process, Transition-independent Multi-agent MDPs, Reward interactions, Conditional Return Graphs",

author = "Joris Scharpff and Roijers, {Diederik M.} and Oliehoek, {Frans A.} and Spaan, {Matthijs T. J.} and {de Weerdt}, M.M.",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 30th AAAI Conference on Artificial Intelligence, AAAI-16 ; Conference date: 12-02-2016 Through 17-02-2016",

year = "2016",

language = "English",

series = "Proceedings of the AAAI",

publisher = "American Association for Artificial Intelligence (AAAI)",

pages = "3174--3180",

booktitle = "Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-16",

address = "United States",

}

Scharpff, J, Roijers, DM, Oliehoek, FA , Spaan, MTJ & de Weerdt, MM 2016, Solving Transition-Independent Multi-agent MDPs with Sparse Interactions. in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-16. Proceedings of the AAAI, American Association for Artificial Intelligence (AAAI), pp. 3174-3180, 30th AAAI Conference on Artificial Intelligence, Phoenix, United States, 12/02/16. <https://ojs.aaai.org/index.php/AAAI/article/view/10405>

Solving Transition-Independent Multi-agent MDPs with Sparse Interactions. / Scharpff, Joris; Roijers, Diederik M.; Oliehoek, Frans A. et al.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-16. American Association for Artificial Intelligence (AAAI), 2016. p. 3174-3180 (Proceedings of the AAAI).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Solving Transition-Independent Multi-agent MDPs with Sparse Interactions

AU - Scharpff, Joris

AU - Roijers, Diederik M.

AU - Oliehoek, Frans A.

AU - Spaan, Matthijs T. J.

AU - de Weerdt, M.M.

N1 - Conference code: 30

PY - 2016

Y1 - 2016

N2 - In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these de- pendencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to previously unsolvable problems.

AB - In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these de- pendencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to previously unsolvable problems.

KW - Markov Decision Process

KW - Transition-independent Multi-agent MDPs

KW - Reward interactions

KW - Conditional Return Graphs

UR - https://arxiv.org/abs/1511.09047

M3 - Conference contribution

T3 - Proceedings of the AAAI

SP - 3174

EP - 3180

BT - Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-16

PB - American Association for Artificial Intelligence (AAAI)

T2 - 30th AAAI Conference on Artificial Intelligence

Y2 - 12 February 2016 through 17 February 2016

ER -

Solving Transition-Independent Multi-agent MDPs with Sparse Interactions

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this