Search

Linear Temporal Logic (LTL) offers a formal way of specifying complex objectives for Cyber-Physical Systems (CPS). In the presence of uncertain dynamics, the planning for an LTL objective can be solved by model-free reinforcement learning (RL). Surrogate rewards for LTL objectives are commonly utilized in model-free RL for LTL objectives. In a widely adopted surrogate reward approach, two discount factors are used to ensure that the expected return (i.e., the cumulative reward) approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as RL. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate, through an example, that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to an inaccurate evaluation of the expected return. To address this issue, we propose a condition that ensures the Bellman equation has the expected return as its unique solution. Specifically, we require that the solutions for states within rejecting bottom strongly connected components (BSCCs) be zero. We prove that this condition guarantees the uniqueness of the solution, first for recurrent states (i.e., states within a BSCC) and then for transient states. Finally, we numerically validate our results through case studies.

Search Results

Refine search

Refine search

Actions for selected content:

1 results

A Necessary and Sufficient Condition for the Unique Solution of the Bellman Equation for LTL Surrogate Rewards

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

1 results

A Necessary and Sufficient Condition for the Unique Solution of the Bellman Equation for LTL Surrogate Rewards