Hybrid Hierarchical Reinforcement Learning for online guidance and navigation with partial observability

Ye Zhou; Erik Jan van Kampen; Qiping Chu

doi:10.1016/j.neucom.2018.11.072

Hybrid Hierarchical Reinforcement Learning for online guidance and navigation with partial observability

Ye Zhou^*, Erik Jan van Kampen, Qiping Chu

^*Corresponding author for this work

Control & Simulation

Research output: Contribution to journal › Article › Scientific › peer-review

17 Citations (Scopus)

Abstract

Autonomous guidance and navigation problems often have high-dimensional spaces, multiple objectives, and consequently a large number of states and actions, which is known as the ‘curse of dimensionality’. Furthermore, systems often have partial observability instead of a perfect perception of their environment. Recent research has sought to deal with these problems by using Hierarchical Reinforcement Learning, which often uses same or similar reinforcement learning methods within one application so that multiple objectives can be combined. However, there is not a single learning method that can benefit all targets. To acquire optimal decision-making most efficiently, this paper proposes a hybrid Hierarchical Reinforcement Learning method consisting of several levels, where each level uses various methods to optimize the learning with different types of information and objectives. An algorithm is provided using the proposed method and applied to an online guidance and navigation task. The navigation environments are complex, partially observable, and a priori unknown. Simulation results indicate that the proposed hybrid Hierarchical Reinforcement Learning method, compared to flat or non-hybrid methods, can help to accelerate learning, to alleviate the ‘curse of dimensionality’ in complex decision-making tasks. In addition, the mixture of relative micro states and absolute macro states can help to reduce the uncertainty or ambiguity at high levels, to transfer the learned results within and across tasks efficiently, and to apply to non-stationary environments. This proposed method can yield a hierarchical optimal policy for autonomous guidance and navigation without a priori knowledge of the system or the environment.

Original language	English
Pages (from-to)	443-457
Number of pages	15
Journal	Neurocomputing
Volume	331
DOIs	https://doi.org/10.1016/j.neucom.2018.11.072
Publication status	Published - 2019

Keywords

Hierarchical Reinforcement Learning
Hybrid learning
Non-stationary environment
Online guidance and navigation
Partial observability

Access to Document

10.1016/j.neucom.2018.11.072

Cite this

@article{527918fe9fff4c8287da3a7ca39ed9ff,

title = "Hybrid Hierarchical Reinforcement Learning for online guidance and navigation with partial observability",

abstract = "Autonomous guidance and navigation problems often have high-dimensional spaces, multiple objectives, and consequently a large number of states and actions, which is known as the {\textquoteleft}curse of dimensionality{\textquoteright}. Furthermore, systems often have partial observability instead of a perfect perception of their environment. Recent research has sought to deal with these problems by using Hierarchical Reinforcement Learning, which often uses same or similar reinforcement learning methods within one application so that multiple objectives can be combined. However, there is not a single learning method that can benefit all targets. To acquire optimal decision-making most efficiently, this paper proposes a hybrid Hierarchical Reinforcement Learning method consisting of several levels, where each level uses various methods to optimize the learning with different types of information and objectives. An algorithm is provided using the proposed method and applied to an online guidance and navigation task. The navigation environments are complex, partially observable, and a priori unknown. Simulation results indicate that the proposed hybrid Hierarchical Reinforcement Learning method, compared to flat or non-hybrid methods, can help to accelerate learning, to alleviate the {\textquoteleft}curse of dimensionality{\textquoteright} in complex decision-making tasks. In addition, the mixture of relative micro states and absolute macro states can help to reduce the uncertainty or ambiguity at high levels, to transfer the learned results within and across tasks efficiently, and to apply to non-stationary environments. This proposed method can yield a hierarchical optimal policy for autonomous guidance and navigation without a priori knowledge of the system or the environment.",

keywords = "Hierarchical Reinforcement Learning, Hybrid learning, Non-stationary environment, Online guidance and navigation, Partial observability",

author = "Ye Zhou and {van Kampen}, {Erik Jan} and Qiping Chu",

year = "2019",

doi = "10.1016/j.neucom.2018.11.072",

language = "English",

volume = "331",

pages = "443--457",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier",

}

TY - JOUR

T1 - Hybrid Hierarchical Reinforcement Learning for online guidance and navigation with partial observability

AU - Zhou, Ye

AU - van Kampen, Erik Jan

AU - Chu, Qiping

PY - 2019

Y1 - 2019

N2 - Autonomous guidance and navigation problems often have high-dimensional spaces, multiple objectives, and consequently a large number of states and actions, which is known as the ‘curse of dimensionality’. Furthermore, systems often have partial observability instead of a perfect perception of their environment. Recent research has sought to deal with these problems by using Hierarchical Reinforcement Learning, which often uses same or similar reinforcement learning methods within one application so that multiple objectives can be combined. However, there is not a single learning method that can benefit all targets. To acquire optimal decision-making most efficiently, this paper proposes a hybrid Hierarchical Reinforcement Learning method consisting of several levels, where each level uses various methods to optimize the learning with different types of information and objectives. An algorithm is provided using the proposed method and applied to an online guidance and navigation task. The navigation environments are complex, partially observable, and a priori unknown. Simulation results indicate that the proposed hybrid Hierarchical Reinforcement Learning method, compared to flat or non-hybrid methods, can help to accelerate learning, to alleviate the ‘curse of dimensionality’ in complex decision-making tasks. In addition, the mixture of relative micro states and absolute macro states can help to reduce the uncertainty or ambiguity at high levels, to transfer the learned results within and across tasks efficiently, and to apply to non-stationary environments. This proposed method can yield a hierarchical optimal policy for autonomous guidance and navigation without a priori knowledge of the system or the environment.

AB - Autonomous guidance and navigation problems often have high-dimensional spaces, multiple objectives, and consequently a large number of states and actions, which is known as the ‘curse of dimensionality’. Furthermore, systems often have partial observability instead of a perfect perception of their environment. Recent research has sought to deal with these problems by using Hierarchical Reinforcement Learning, which often uses same or similar reinforcement learning methods within one application so that multiple objectives can be combined. However, there is not a single learning method that can benefit all targets. To acquire optimal decision-making most efficiently, this paper proposes a hybrid Hierarchical Reinforcement Learning method consisting of several levels, where each level uses various methods to optimize the learning with different types of information and objectives. An algorithm is provided using the proposed method and applied to an online guidance and navigation task. The navigation environments are complex, partially observable, and a priori unknown. Simulation results indicate that the proposed hybrid Hierarchical Reinforcement Learning method, compared to flat or non-hybrid methods, can help to accelerate learning, to alleviate the ‘curse of dimensionality’ in complex decision-making tasks. In addition, the mixture of relative micro states and absolute macro states can help to reduce the uncertainty or ambiguity at high levels, to transfer the learned results within and across tasks efficiently, and to apply to non-stationary environments. This proposed method can yield a hierarchical optimal policy for autonomous guidance and navigation without a priori knowledge of the system or the environment.

KW - Hierarchical Reinforcement Learning

KW - Hybrid learning

KW - Non-stationary environment

KW - Online guidance and navigation

KW - Partial observability

UR - http://www.scopus.com/inward/record.url?scp=85057966024&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.11.072

DO - 10.1016/j.neucom.2018.11.072

M3 - Article

AN - SCOPUS:85057966024

SN - 0925-2312

VL - 331

SP - 443

EP - 457

JO - Neurocomputing

JF - Neurocomputing

ER -

Hybrid Hierarchical Reinforcement Learning for online guidance and navigation with partial observability

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this