A Necessary and Sufficient Condition for the Unique Solution of the Bellman Equation for LTL Surrogate Rewards

Zetong Xuan; Alper Kamil Bozkurt; Miroslav Pajic; Yu Wang

doi:10.1017/cbp.2025.10004

Accepted manuscript

A Necessary and Sufficient Condition for the Unique Solution of the Bellman Equation for LTL Surrogate Rewards

Published online by Cambridge University Press: 04 November 2025

Zetong Xuan

Alper Kamil Bozkurt ,

Miroslav Pajic and

Yu Wang

Show author details

Zetong Xuan*: Affiliation:
Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, 32611, FL, USA
Alper Kamil Bozkurt: Affiliation:
Department of Computer Science, University of Maryland, College Park, 20742, MD, USA
Miroslav Pajic: Affiliation:
Department of Electrical and Computer Engineering, Duke University, Durham, 27708, NC, USA
Yu Wang: Affiliation:
Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, 32611, FL, USA
*: *Author for correspondence. Email: z.xuan@ufl.edu

Article contents

Abstract

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Linear Temporal Logic (LTL) offers a formal way of specifying complex objectives for Cyber-Physical Systems (CPS). In the presence of uncertain dynamics, the planning for an LTL objective can be solved by model-free reinforcement learning (RL). Surrogate rewards for LTL objectives are commonly utilized in model-free RL for LTL objectives. In a widely adopted surrogate reward approach, two discount factors are used to ensure that the expected return (i.e., the cumulative reward) approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as RL. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate, through an example, that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to an inaccurate evaluation of the expected return. To address this issue, we propose a condition that ensures the Bellman equation has the expected return as its unique solution. Specifically, we require that the solutions for states within rejecting bottom strongly connected components (BSCCs) be zero. We prove that this condition guarantees the uniqueness of the solution, first for recurrent states (i.e., states within a BSCC) and then for transient states. Finally, we numerically validate our results through case studies.

Keywords

Markov Chain Limit-Deterministic Büchi Automaton Reachability

Information

Type: Results
Information: Research Directions: Cyber-Physical Systems , Accepted manuscript , pp. 1 - 13

DOI: https://doi.org/10.1017/cbp.2025.10004 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.

Article contents

A Necessary and Sufficient Condition for the Unique Solution of the Bellman Equation for LTL Surrogate Rewards

Abstract

Keywords

Information

What is Research Directions?

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests