行业报告详情 - 行业报告数据库

行业分类

找到报告 1 篇当前为第 1 页共 1 页

应用到仓库管理的最小二乘即时差分执行器-评价器算法

Least Squares Temporal Difference Actor-Critic Algorithm with Applications to Warehouse Management

作者：Estanjini, R. M.Li, K.Paschalidis, I. C. 作者单位：Boston Univ., MA. 加工时间：2014-06-11 信息来源：科技报告（AD）

关键词：信息系统；动态规划；执行器-评价器算法；马尔可夫决策过程
摘要：This paper develops a new approximate dynamic programming algorithm for Markov decision problems and applies it to a vehicle dispatching problem arising in warehouse management. The algorithm is of the actor-critic type and uses a least squares temporal difference learning method. It operates on a sample-path of the system and optimizes the policy within a prespecified class parameterized by a parsimonious set of parameters. The method is applicable to a partially observable Markov decision process setting where the measurements of state variables are potentially corrupted and the cost is only observed through the imperfect state observations. We show that under reasonable assumptions, the algorithm converges to a locally optimal parameter set. We also show that the imperfect cost observations do not affect the policy and the algorithm minimizes the true expected cost. In the warehouse application, the problem is to dispatch sensor-equipped forklifts in order to minimize operating costs involving product movement delays and forklift maintenance. We consider instances where standard dynamic programming is computationally intractable. Simulation results confirm the theoretical claims of the paper and show that our algorithm converges more smoothly than earlier actor-critic algorithms while substantially outperforming heuristics used in practice.

行业分类

友情链接

联系我们

QQ咨询

电话咨询

微信公众号

感谢访问