针对传统层次聚类法采用贪婪策略的聚类过程可能无法达到聚类效果最优的情况,提出了一种基于rollout策略下的层次聚类法对所得聚类方案进行优化改进。分析了聚类过程中决策实体、平台与任务之间的关系以及约束条件,以作战任务的执行时...针对传统层次聚类法采用贪婪策略的聚类过程可能无法达到聚类效果最优的情况,提出了一种基于rollout策略下的层次聚类法对所得聚类方案进行优化改进。分析了聚类过程中决策实体、平台与任务之间的关系以及约束条件,以作战任务的执行时间作为工作负载测度,建立以决策实体工作负载的均方根(root mean square,RMS)为目标函数的问题数学模型,以任务与平台的分配关系作为输入信息,在基于最小RMS值的平台合并准则下采用rollout策略对层次聚类法的每层聚类进行优化,得到平台与决策实体的优化配置关系。最后通过联合作战仿真算例和一般算例进行仿真分析,验证了该方法的可行性和优越性。展开更多
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
文摘针对传统层次聚类法采用贪婪策略的聚类过程可能无法达到聚类效果最优的情况,提出了一种基于rollout策略下的层次聚类法对所得聚类方案进行优化改进。分析了聚类过程中决策实体、平台与任务之间的关系以及约束条件,以作战任务的执行时间作为工作负载测度,建立以决策实体工作负载的均方根(root mean square,RMS)为目标函数的问题数学模型,以任务与平台的分配关系作为输入信息,在基于最小RMS值的平台合并准则下采用rollout策略对层次聚类法的每层聚类进行优化,得到平台与决策实体的优化配置关系。最后通过联合作战仿真算例和一般算例进行仿真分析,验证了该方法的可行性和优越性。
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.