随着空间目标的数量逐渐增多、空中目标动态性日趋提升,对目标的观测定位问题变得愈发重要.由于需同时观测的目标多且目标动态性强,而星座观测资源有限,为了更高效地调用星座观测资源,需要动态调整多目标协同观测方案,使各目标均具有较...随着空间目标的数量逐渐增多、空中目标动态性日趋提升,对目标的观测定位问题变得愈发重要.由于需同时观测的目标多且目标动态性强,而星座观测资源有限,为了更高效地调用星座观测资源,需要动态调整多目标协同观测方案,使各目标均具有较好的定位精度,因此需解决星座协同观测多目标的任务规划问题.建立星座姿态轨道模型、目标飞行模型、目标协同探测及定位模型,提出基于几何精度衰减因子(geometric dilution of precision, GDOP)的目标观测定位误差预估模型及目标观测优先级模型,建立基于强化学习的协同观测任务规划框架,采用多头自注意力机制建立策略网络,以及近端策略优化算法开展任务规划算法训练.仿真验证论文提出的方法相比传统启发式方法提升了多目标观测精度和有效跟踪时间,相比遗传算法具有更快的计算速度.展开更多
为提高移动机器人在无地图情况下的视觉导航能力,提升导航成功率,提出了一种融合长短期记忆神经网络(long short term memory, LSTM)和近端策略优化算法(proximal policy optimization, PPO)算法的移动机器人视觉导航模型。首先,该模型...为提高移动机器人在无地图情况下的视觉导航能力,提升导航成功率,提出了一种融合长短期记忆神经网络(long short term memory, LSTM)和近端策略优化算法(proximal policy optimization, PPO)算法的移动机器人视觉导航模型。首先,该模型融合LSTM和PPO算法作为视觉导航的网络模型;其次,通过移动机器人动作,与目标距离,运动时间等因素设计奖励函数,用以训练目标;最后,以移动机器人第一视角获得的RGB-D图像及目标点的极性坐标为输入,以移动机器人的连续动作值为输出,实现无地图的端到端视觉导航任务,并根据推理到达未接受过训练的新目标。对比前序算法,该模型在模拟环境中收敛速度更快,旧目标的导航成功率平均提高17.7%,新目标的导航成功率提高23.3%,具有较好的导航性能。展开更多
风光可再生能源制备“绿氢”是实现能源低碳化的重要途径,但风能、太阳能的波动性、间歇性等问题会使系统存在“弃风、弃光”现象。为解决该问题,构建了可再生能源并网制氢系统,针对传统CPLEX需要精准预测数据、基于状态控制法的监控策...风光可再生能源制备“绿氢”是实现能源低碳化的重要途径,但风能、太阳能的波动性、间歇性等问题会使系统存在“弃风、弃光”现象。为解决该问题,构建了可再生能源并网制氢系统,针对传统CPLEX需要精准预测数据、基于状态控制法的监控策略控制效果不够理想的缺点,将协调控制转化为序列决策问题,采用深度强化学习连续近端策略优化算法进行解决。在发电量、负荷等多种因素变化的情况下,设计了适合解决可再生能源制氢系统调度问题的深度强化学习模型(renewable energy to hydrogen-proximal policy optimization,R2H-PPO),经过足够的训练后能够实现在线决策控制,并与日前控制方案和基于状态控制法的监控策略进行了对比,证明所采用方法避免了传统方案的不足,并能有效处理不同时刻、天气、季节的场景。结果证明了所提出的R2H-PPO方法的可行性和有效性。展开更多
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to...The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.展开更多
文摘随着空间目标的数量逐渐增多、空中目标动态性日趋提升,对目标的观测定位问题变得愈发重要.由于需同时观测的目标多且目标动态性强,而星座观测资源有限,为了更高效地调用星座观测资源,需要动态调整多目标协同观测方案,使各目标均具有较好的定位精度,因此需解决星座协同观测多目标的任务规划问题.建立星座姿态轨道模型、目标飞行模型、目标协同探测及定位模型,提出基于几何精度衰减因子(geometric dilution of precision, GDOP)的目标观测定位误差预估模型及目标观测优先级模型,建立基于强化学习的协同观测任务规划框架,采用多头自注意力机制建立策略网络,以及近端策略优化算法开展任务规划算法训练.仿真验证论文提出的方法相比传统启发式方法提升了多目标观测精度和有效跟踪时间,相比遗传算法具有更快的计算速度.
文摘为提高移动机器人在无地图情况下的视觉导航能力,提升导航成功率,提出了一种融合长短期记忆神经网络(long short term memory, LSTM)和近端策略优化算法(proximal policy optimization, PPO)算法的移动机器人视觉导航模型。首先,该模型融合LSTM和PPO算法作为视觉导航的网络模型;其次,通过移动机器人动作,与目标距离,运动时间等因素设计奖励函数,用以训练目标;最后,以移动机器人第一视角获得的RGB-D图像及目标点的极性坐标为输入,以移动机器人的连续动作值为输出,实现无地图的端到端视觉导航任务,并根据推理到达未接受过训练的新目标。对比前序算法,该模型在模拟环境中收敛速度更快,旧目标的导航成功率平均提高17.7%,新目标的导航成功率提高23.3%,具有较好的导航性能。
文摘风光可再生能源制备“绿氢”是实现能源低碳化的重要途径,但风能、太阳能的波动性、间歇性等问题会使系统存在“弃风、弃光”现象。为解决该问题,构建了可再生能源并网制氢系统,针对传统CPLEX需要精准预测数据、基于状态控制法的监控策略控制效果不够理想的缺点,将协调控制转化为序列决策问题,采用深度强化学习连续近端策略优化算法进行解决。在发电量、负荷等多种因素变化的情况下,设计了适合解决可再生能源制氢系统调度问题的深度强化学习模型(renewable energy to hydrogen-proximal policy optimization,R2H-PPO),经过足够的训练后能够实现在线决策控制,并与日前控制方案和基于状态控制法的监控策略进行了对比,证明所采用方法避免了传统方案的不足,并能有效处理不同时刻、天气、季节的场景。结果证明了所提出的R2H-PPO方法的可行性和有效性。
基金the Project of National Natural Science Foundation of China(Grant No.62106283)the Project of National Natural Science Foundation of China(Grant No.72001214)to provide fund for conducting experimentsthe Project of Natural Science Foundation of Shaanxi Province(Grant No.2020JQ-484)。
文摘The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.