基于学习的线性时序逻辑规范马尔可夫决策过程控制合成方法
A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications
关键词:线性时序逻辑;马尔可夫决策过程;控制合成
摘 要:We propose to synthesize a control policy for a Markov decision process (MDP) such that the resulting traces of the MDP satisfy a linear temporal logic (LTL) property. We construct a product MDP that incorporates a deterministic Rabin automaton generated from the desired LTL property.We prove that our method is guaranteed to find a controller that satisfies the LTL property with probability one if such a policy exists, and we suggest empirically with a case study in traffic control that our method produces reasonable control strategies even when the LTL property cannot be satisfied with probability one.