Resource-constrained Multi-agent Markov Decision Processes

Frits de Nijs

doi:10.4233/uuid:89c0f1a2-d19f-4466-9cc5-52aeb3950e53

Resource-constrained Multi-agent Markov Decision Processes

Frits de Nijs

Algorithmics

Research output: Thesis › Dissertation (TU Delft)

169 Downloads (Pure)

Abstract

Intelligent autonomous agents, designed to automate and simplify many aspects of our society, will increasingly be required to also interact with other agents autonomously. Where agents interact, they are likely to encounter resource constraints. For example, agents managing household appliances to optimize electricity usage might need to share the limited capacity of the distribution grid.
This thesis describes research into new algorithms for optimizing the behavior of agents operating in constrained environments, when these agents have significant uncertainty about the effects of their actions on their state. Such systems are effectively modeled in a framework of constrained multi-agent Markov decision processes (MDPs). A single-agent MDP model captures the uncertainty in the outcome of the actions chosen by a specific agent. It does so by providing a probabilistic model of state transitions, describing the likelihood of arriving in a future state, conditional on the current state and action. Agents collect different rewards or penalties depending on the current state and chosen action, informing their objective of maximizing their expected reward. To include constraints, resource consumption functions are added to the actions, and the agents' (shared) objective is modified with a condition restricting their (cumulative) resource consumption. We propose novel algorithms to advance the state of the art in three challenging settings: computing static preallocations off-line, computing dynamic (re)allocations on-line, and optimally learning model dynamics through safe reinforcement learning under the constraints. Taken together, these algorithms show how agents can coordinate their actions under uncertainty and shared resource constraints in a broad range of conditions. Furthermore, the proposed solutions are complementary: static preallocations can be used as back-up strategy for when a communication disruption prevents the use of dynamic allocations.

Original language	English
Qualification	Doctor of Philosophy
Awarding Institution	Delft University of Technology
Supervisors/Advisors	de Weerdt, M.M., Supervisor Spaan, M.T.J., Supervisor
Thesis sponsors	Alliander
Award date	4 Apr 2019
Electronic ISBNs	978-94-6375-357-9
DOIs	https://doi.org/10.4233/uuid:89c0f1a2-d19f-4466-9cc5-52aeb3950e53
Publication status	Published - 21 Feb 2019

Keywords

Decision making under uncertainty
Multi-agent systems
Optimization
Constraint decoupling
Reinforcement learning

Access to Document

10.4233/uuid:89c0f1a2-d19f-4466-9cc5-52aeb3950e53

FdN-thesisFinal published version, 3.27 MB

Cite this

@phdthesis{89c0f1a2d19f44669cc552aeb3950e53,

title = "Resource-constrained Multi-agent Markov Decision Processes",

abstract = "Intelligent autonomous agents, designed to automate and simplify many aspects of our society, will increasingly be required to also interact with other agents autonomously. Where agents interact, they are likely to encounter resource constraints. For example, agents managing household appliances to optimize electricity usage might need to share the limited capacity of the distribution grid.This thesis describes research into new algorithms for optimizing the behavior of agents operating in constrained environments, when these agents have significant uncertainty about the effects of their actions on their state. Such systems are effectively modeled in a framework of constrained multi-agent Markov decision processes (MDPs). A single-agent MDP model captures the uncertainty in the outcome of the actions chosen by a specific agent. It does so by providing a probabilistic model of state transitions, describing the likelihood of arriving in a future state, conditional on the current state and action. Agents collect different rewards or penalties depending on the current state and chosen action, informing their objective of maximizing their expected reward. To include constraints, resource consumption functions are added to the actions, and the agents' (shared) objective is modified with a condition restricting their (cumulative) resource consumption. We propose novel algorithms to advance the state of the art in three challenging settings: computing static preallocations off-line, computing dynamic (re)allocations on-line, and optimally learning model dynamics through safe reinforcement learning under the constraints. Taken together, these algorithms show how agents can coordinate their actions under uncertainty and shared resource constraints in a broad range of conditions. Furthermore, the proposed solutions are complementary: static preallocations can be used as back-up strategy for when a communication disruption prevents the use of dynamic allocations.",

keywords = "Decision making under uncertainty, Multi-agent systems, Optimization, Constraint decoupling, Reinforcement learning",

author = "{de Nijs}, Frits",

year = "2019",

month = feb,

day = "21",

doi = "10.4233/uuid:89c0f1a2-d19f-4466-9cc5-52aeb3950e53",

language = "English",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - Resource-constrained Multi-agent Markov Decision Processes

AU - de Nijs, Frits

PY - 2019/2/21

Y1 - 2019/2/21

N2 - Intelligent autonomous agents, designed to automate and simplify many aspects of our society, will increasingly be required to also interact with other agents autonomously. Where agents interact, they are likely to encounter resource constraints. For example, agents managing household appliances to optimize electricity usage might need to share the limited capacity of the distribution grid.This thesis describes research into new algorithms for optimizing the behavior of agents operating in constrained environments, when these agents have significant uncertainty about the effects of their actions on their state. Such systems are effectively modeled in a framework of constrained multi-agent Markov decision processes (MDPs). A single-agent MDP model captures the uncertainty in the outcome of the actions chosen by a specific agent. It does so by providing a probabilistic model of state transitions, describing the likelihood of arriving in a future state, conditional on the current state and action. Agents collect different rewards or penalties depending on the current state and chosen action, informing their objective of maximizing their expected reward. To include constraints, resource consumption functions are added to the actions, and the agents' (shared) objective is modified with a condition restricting their (cumulative) resource consumption. We propose novel algorithms to advance the state of the art in three challenging settings: computing static preallocations off-line, computing dynamic (re)allocations on-line, and optimally learning model dynamics through safe reinforcement learning under the constraints. Taken together, these algorithms show how agents can coordinate their actions under uncertainty and shared resource constraints in a broad range of conditions. Furthermore, the proposed solutions are complementary: static preallocations can be used as back-up strategy for when a communication disruption prevents the use of dynamic allocations.

AB - Intelligent autonomous agents, designed to automate and simplify many aspects of our society, will increasingly be required to also interact with other agents autonomously. Where agents interact, they are likely to encounter resource constraints. For example, agents managing household appliances to optimize electricity usage might need to share the limited capacity of the distribution grid.This thesis describes research into new algorithms for optimizing the behavior of agents operating in constrained environments, when these agents have significant uncertainty about the effects of their actions on their state. Such systems are effectively modeled in a framework of constrained multi-agent Markov decision processes (MDPs). A single-agent MDP model captures the uncertainty in the outcome of the actions chosen by a specific agent. It does so by providing a probabilistic model of state transitions, describing the likelihood of arriving in a future state, conditional on the current state and action. Agents collect different rewards or penalties depending on the current state and chosen action, informing their objective of maximizing their expected reward. To include constraints, resource consumption functions are added to the actions, and the agents' (shared) objective is modified with a condition restricting their (cumulative) resource consumption. We propose novel algorithms to advance the state of the art in three challenging settings: computing static preallocations off-line, computing dynamic (re)allocations on-line, and optimally learning model dynamics through safe reinforcement learning under the constraints. Taken together, these algorithms show how agents can coordinate their actions under uncertainty and shared resource constraints in a broad range of conditions. Furthermore, the proposed solutions are complementary: static preallocations can be used as back-up strategy for when a communication disruption prevents the use of dynamic allocations.

KW - Decision making under uncertainty

KW - Multi-agent systems

KW - Optimization

KW - Constraint decoupling

KW - Reinforcement learning

U2 - 10.4233/uuid:89c0f1a2-d19f-4466-9cc5-52aeb3950e53

DO - 10.4233/uuid:89c0f1a2-d19f-4466-9cc5-52aeb3950e53

M3 - Dissertation (TU Delft)

ER -

Resource-constrained Multi-agent Markov Decision Processes

Abstract

Keywords

Access to Document

Fingerprint

Cite this