Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors

Yi Chun Chen; Mykel J. Kochenderfer; Matthijs T.J. Spaan

doi:10.1109/IROS.2018.8594418

Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors

Yi Chun Chen, Mykel J. Kochenderfer, Matthijs T.J. Spaan

Algorithmics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

7 Citations (Scopus)

Abstract

A common solution criterion for partially observable Markov decision processes (POMDPs) is to maximize the expected sum of exponentially discounted rewards, for which a variety of approximate methods have been proposed. Those that plan in the belief space typically provide tighter performance guarantees, but those that plan over the state space (e.g., QMDP and FIB) often require much less memory and computation. This paper presents an encouraging result that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem. This phenomenon is confirmed by both a theoretical analysis as well as a series of empirical studies on benchmark problems. As predicted by the theory and confirmed empirically, the phenomenon is most prominent when the observation model is noisy or rewards are sparse.

Original language	English
Title of host publication	2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018
Editors	Carlos Balaguer, Hajime Asama, Danica Kragic, Kevin Lynch
Place of Publication	Piscataway, NJ, USA
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
Pages	3531-3536
Number of pages	6
ISBN (Electronic)	978-1-5386-8094-0
DOIs	https://doi.org/10.1109/IROS.2018.8594418
Publication status	Published - 2018
Event	2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018 - Madrid, Spain Duration: 1 Oct 2018 → 5 Oct 2018

Conference

Conference	2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018
Country/Territory	Spain
City	Madrid
Period	1/10/18 → 5/10/18

Access to Document

10.1109/IROS.2018.8594418

Cite this

Chen, Y. C., Kochenderfer, M. J., & Spaan, M. T. J. (2018). Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors. In C. Balaguer, H. Asama, D. Kragic, & K. Lynch (Eds.), 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018 (pp. 3531-3536). Article 8594418 Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/IROS.2018.8594418

Chen, Yi Chun ; Kochenderfer, Mykel J. ; Spaan, Matthijs T.J. / Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018. editor / Carlos Balaguer ; Hajime Asama ; Danica Kragic ; Kevin Lynch. Piscataway, NJ, USA : Institute of Electrical and Electronics Engineers (IEEE), 2018. pp. 3531-3536

@inproceedings{c321b2be1a81445294a0149ec6f56691,

title = "Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors",

abstract = "A common solution criterion for partially observable Markov decision processes (POMDPs) is to maximize the expected sum of exponentially discounted rewards, for which a variety of approximate methods have been proposed. Those that plan in the belief space typically provide tighter performance guarantees, but those that plan over the state space (e.g., QMDP and FIB) often require much less memory and computation. This paper presents an encouraging result that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem. This phenomenon is confirmed by both a theoretical analysis as well as a series of empirical studies on benchmark problems. As predicted by the theory and confirmed empirically, the phenomenon is most prominent when the observation model is noisy or rewards are sparse.",

author = "Chen, {Yi Chun} and Kochenderfer, {Mykel J.} and Spaan, {Matthijs T.J.}",

year = "2018",

doi = "10.1109/IROS.2018.8594418",

language = "English",

pages = "3531--3536",

editor = "Balaguer, {Carlos } and Asama, {Hajime } and Kragic, {Danica } and Lynch, {Kevin }",

booktitle = "2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

address = "United States",

note = "2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018 ; Conference date: 01-10-2018 Through 05-10-2018",

}

Chen, YC, Kochenderfer, MJ & Spaan, MTJ 2018, Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors. in C Balaguer, H Asama, D Kragic & K Lynch (eds), 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018., 8594418, Institute of Electrical and Electronics Engineers (IEEE), Piscataway, NJ, USA, pp. 3531-3536, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, Madrid, Spain, 1/10/18. https://doi.org/10.1109/IROS.2018.8594418

Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors. / Chen, Yi Chun; Kochenderfer, Mykel J.; Spaan, Matthijs T.J.
2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018. ed. / Carlos Balaguer; Hajime Asama; Danica Kragic; Kevin Lynch. Piscataway, NJ, USA: Institute of Electrical and Electronics Engineers (IEEE), 2018. p. 3531-3536 8594418.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors

AU - Chen, Yi Chun

AU - Kochenderfer, Mykel J.

AU - Spaan, Matthijs T.J.

PY - 2018

Y1 - 2018

N2 - A common solution criterion for partially observable Markov decision processes (POMDPs) is to maximize the expected sum of exponentially discounted rewards, for which a variety of approximate methods have been proposed. Those that plan in the belief space typically provide tighter performance guarantees, but those that plan over the state space (e.g., QMDP and FIB) often require much less memory and computation. This paper presents an encouraging result that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem. This phenomenon is confirmed by both a theoretical analysis as well as a series of empirical studies on benchmark problems. As predicted by the theory and confirmed empirically, the phenomenon is most prominent when the observation model is noisy or rewards are sparse.

AB - A common solution criterion for partially observable Markov decision processes (POMDPs) is to maximize the expected sum of exponentially discounted rewards, for which a variety of approximate methods have been proposed. Those that plan in the belief space typically provide tighter performance guarantees, but those that plan over the state space (e.g., QMDP and FIB) often require much less memory and computation. This paper presents an encouraging result that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem. This phenomenon is confirmed by both a theoretical analysis as well as a series of empirical studies on benchmark problems. As predicted by the theory and confirmed empirically, the phenomenon is most prominent when the observation model is noisy or rewards are sparse.

UR - http://www.scopus.com/inward/record.url?scp=85062944186&partnerID=8YFLogxK

U2 - 10.1109/IROS.2018.8594418

DO - 10.1109/IROS.2018.8594418

M3 - Conference contribution

AN - SCOPUS:85062944186

SP - 3531

EP - 3536

BT - 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018

A2 - Balaguer, Carlos

A2 - Asama, Hajime

A2 - Kragic, Danica

A2 - Lynch, Kevin

PB - Institute of Electrical and Electronics Engineers (IEEE)

CY - Piscataway, NJ, USA

T2 - 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018

Y2 - 1 October 2018 through 5 October 2018

ER -

Chen YC, Kochenderfer MJ, Spaan MTJ. Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors. In Balaguer C, Asama H, Kragic D, Lynch K, editors, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018. Piscataway, NJ, USA: Institute of Electrical and Electronics Engineers (IEEE). 2018. p. 3531-3536. 8594418 doi: 10.1109/IROS.2018.8594418

Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors

Abstract

Conference

Access to Document

Other files and links

Fingerprint

Cite this