关键词:强化学习;马尔可夫决策过程;执行跟踪;场景识别
摘 要:We present a general method for learning dynamic policies to optimize Anytime performance in visual recognition. We approach this problem from the perspective of Markov Decision Processes, and use reinforcement learning techniques. Crucially, decisions are made at test time and depend;on observed data and intermediate results. Our method is applicable to a wide variety of existing detectors and classifiers, as it learns from execution traces and requires no special knowledge of their implementation.