Hostname: page-component-7f64f4797f-d87pz Total loading time: 0 Render date: 2025-11-09T23:40:53.592Z Has data issue: false hasContentIssue false
Accepted manuscript

A Necessary and Sufficient Condition for the Unique Solution of the Bellman Equation for LTL Surrogate Rewards

Published online by Cambridge University Press:  04 November 2025

Zetong Xuan*
Affiliation:
Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, 32611, FL, USA
Alper Kamil Bozkurt
Affiliation:
Department of Computer Science, University of Maryland, College Park, 20742, MD, USA
Miroslav Pajic
Affiliation:
Department of Electrical and Computer Engineering, Duke University, Durham, 27708, NC, USA
Yu Wang
Affiliation:
Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, 32611, FL, USA
*
*Author for correspondence. Email: z.xuan@ufl.edu
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Linear Temporal Logic (LTL) offers a formal way of specifying complex objectives for Cyber-Physical Systems (CPS). In the presence of uncertain dynamics, the planning for an LTL objective can be solved by model-free reinforcement learning (RL). Surrogate rewards for LTL objectives are commonly utilized in model-free RL for LTL objectives. In a widely adopted surrogate reward approach, two discount factors are used to ensure that the expected return (i.e., the cumulative reward) approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as RL. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate, through an example, that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to an inaccurate evaluation of the expected return. To address this issue, we propose a condition that ensures the Bellman equation has the expected return as its unique solution. Specifically, we require that the solutions for states within rejecting bottom strongly connected components (BSCCs) be zero. We prove that this condition guarantees the uniqueness of the solution, first for recurrent states (i.e., states within a BSCC) and then for transient states. Finally, we numerically validate our results through case studies.

Information

Type
Results
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2025. Published by Cambridge University Press