摘要
针对计算机辅助指挥调度舰载机甲板作业的决策过程无法脱离人参与这一特点,引入基于逆向学习的强化学习方法,将指挥员或专家的演示作为学习对象,通过分析舰载机的甲板活动,建立舰载机甲板调度的马尔可夫决策模型(MDP)框架;经线性近似,采用逆向学习方法计算得到回报函数,从而能够通过强化学习方法得到智能优化策略,生成舰载机甲板调度方案。经仿真实验验证,本文所提方法能够较好地学习专家演示,结果符合调度方案优化需求,为形成辅助决策提供了基础。
Traditional aircraft scheduling on carrier flight deck relies heavily on human commander decisions. To improve the computer aided decision making, an inverse reinforcement learning method was proposed. Learning from the commander or expert's demonstration, a Markov decision process (MDP) based aircraft scheduling model by analyzing the aircraft operations on deck was proposed. Then, the optimal policy and schedule were generated by using the linear approximating and inverse reinforcement learning method. Simulation results show that our method can learn experts demonstration well. satisfy the reauirement of scheduling optimization, and facilitate the computer aided decision making.
出处
《国防科技大学学报》
EI
CAS
CSCD
北大核心
2013年第4期171-175,共5页
Journal of National University of Defense Technology
基金
国家自然科学基金资助项目(71031007)
关键词
逆向强化学习
强化学习
舰载机甲板调度
优化方案生成
inverse reinforcement learning
reinforcement learning
aircraft scheduling on flight deck
optimal schedule generation