Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ...Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO.展开更多
针对现阶段外骨骼机器人轨迹运动时出现效果不佳的问题,提出了基于优先经验回放与分区奖励(PERDA)融合的深度确定性策略梯度(DDPG)强化学习算法,即PERDA-DDPG。该方法利用时间差分误差(TD-errors)的大小对经验排序,改变了原始采样的策...针对现阶段外骨骼机器人轨迹运动时出现效果不佳的问题,提出了基于优先经验回放与分区奖励(PERDA)融合的深度确定性策略梯度(DDPG)强化学习算法,即PERDA-DDPG。该方法利用时间差分误差(TD-errors)的大小对经验排序,改变了原始采样的策略。此外,相较于以往二值奖励函数,本文根据物理模型提出针对化的分区奖励。在Open AI Gym平台上实现仿真环境,实验结果表明:改进的算法收敛速度提升了约9.2%,学习过程更加稳定。展开更多
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ...The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.展开更多
文摘Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO.
文摘针对现阶段外骨骼机器人轨迹运动时出现效果不佳的问题,提出了基于优先经验回放与分区奖励(PERDA)融合的深度确定性策略梯度(DDPG)强化学习算法,即PERDA-DDPG。该方法利用时间差分误差(TD-errors)的大小对经验排序,改变了原始采样的策略。此外,相较于以往二值奖励函数,本文根据物理模型提出针对化的分区奖励。在Open AI Gym平台上实现仿真环境,实验结果表明:改进的算法收敛速度提升了约9.2%,学习过程更加稳定。
基金supported by the Key Research and Development Program of Shaanxi(2022GY-089)the Natural Science Basic Research Program of Shaanxi(2022JQ-593).
文摘The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.