By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning...By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.展开更多
针对5G新空口-车联网(New Radio-Vehicle to Everything,NR-V2X)场景下车对基础设施(Vehicle to Infrastructure,V2I)和车对车(Vehicle to Vehicle,V2V)共享上行通信链路的频谱资源分配问题,提出了一种联邦-多智能体深度Q网络(Federated...针对5G新空口-车联网(New Radio-Vehicle to Everything,NR-V2X)场景下车对基础设施(Vehicle to Infrastructure,V2I)和车对车(Vehicle to Vehicle,V2V)共享上行通信链路的频谱资源分配问题,提出了一种联邦-多智能体深度Q网络(Federated Learning-Multi-Agent Deep Q Network,FL-MADQN)算法.该分布式算法中,每个车辆用户作为一个智能体,根据获取的本地信道状态信息,以网络信道容量最佳为目标函数,采用DQN算法训练学习本地网络模型.采用联邦学习加快以及稳定各智能体网络模型训练的收敛速度,即将各智能体的本地模型上传至基站进行聚合形成全局模型,再将全局模型下发至各智能体更新本地模型.仿真结果表明:与传统分布式多智能体DQN算法相比,所提出的方案具有更快的模型收敛速度,并且当车辆用户数增大时仍然保证V2V链路的通信效率以及V2I链路的信道容量.展开更多
Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a st...Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.展开更多
在机器人自主抓取领域,由于抓取对象的大小形状以及分布状态的随机性,仅靠单一的抓取操作完成对工作区域内物体的抓取是十分困难的,而推动和抓取动作的结合可以降低抓取环境的复杂性,通过推动操作可以改变抓取对象的分布以便于更好的抓...在机器人自主抓取领域,由于抓取对象的大小形状以及分布状态的随机性,仅靠单一的抓取操作完成对工作区域内物体的抓取是十分困难的,而推动和抓取动作的结合可以降低抓取环境的复杂性,通过推动操作可以改变抓取对象的分布以便于更好的抓取。但是推动动作的添加同时也会产生一些无效的推动,会降低模型的学习效率。在基于深度Q网络(deep Q-network,DQN)的视觉推动抓取(visual pushing for grasping,VPG)模型的基础上,提出了一种可供性方案用于简化机器人动作规划空间的搜索复杂度,加快机器人抓取的学习进程。通过减少在任何给定情况下可用的行动数量来实现更快的计划,有助于从数据中更高效和精确地学习模型。最后通过在V-rep仿真平台上的仿真场景验证了所提方法的有效性。展开更多
基金funded by National Natural Science Foundation of China(No.62063006)Guangxi Science and Technology Major Program(No.2022AA05002)+1 种基金Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region(No.2022GXZDSY003)Central Leading Local Science and Technology Development Fund Project of Wuzhou(No.202201001).
文摘By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.
文摘针对5G新空口-车联网(New Radio-Vehicle to Everything,NR-V2X)场景下车对基础设施(Vehicle to Infrastructure,V2I)和车对车(Vehicle to Vehicle,V2V)共享上行通信链路的频谱资源分配问题,提出了一种联邦-多智能体深度Q网络(Federated Learning-Multi-Agent Deep Q Network,FL-MADQN)算法.该分布式算法中,每个车辆用户作为一个智能体,根据获取的本地信道状态信息,以网络信道容量最佳为目标函数,采用DQN算法训练学习本地网络模型.采用联邦学习加快以及稳定各智能体网络模型训练的收敛速度,即将各智能体的本地模型上传至基站进行聚合形成全局模型,再将全局模型下发至各智能体更新本地模型.仿真结果表明:与传统分布式多智能体DQN算法相比,所提出的方案具有更快的模型收敛速度,并且当车辆用户数增大时仍然保证V2V链路的通信效率以及V2I链路的信道容量.
文摘针对传统深度Q学习网络(deep Q-learning network,DQN)在具有动态障碍物的路径规划下,移动机器人在探索时频繁碰撞难以移动至目标点的问题,通过在探索策略和经验回放机制上进行改进,提出一种改进的DQN算法。在探索策略上,利用快速搜索随机树(rapidly-exploring random tree,RRT)算法自动生成静态先验知识来指导动作选取,替代ε-贪婪策略的随机动作,提高智能体到达目标的成功率;在经验利用上,使用K-means算法设计一种聚类经验回放机制,根据动态障碍物的位置信息进行聚类分簇,着重采样与当前智能体状态相似的经验进行回放,使智能体更有效地避免碰撞动态障碍物。二维栅格化环境下的仿真实验表明,在动态环境下,该算法可以避开静态和动态障碍物,成功移动至目标点,验证了该算法在应对动态避障路径规划的可行性。
文摘Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.
文摘在机器人自主抓取领域,由于抓取对象的大小形状以及分布状态的随机性,仅靠单一的抓取操作完成对工作区域内物体的抓取是十分困难的,而推动和抓取动作的结合可以降低抓取环境的复杂性,通过推动操作可以改变抓取对象的分布以便于更好的抓取。但是推动动作的添加同时也会产生一些无效的推动,会降低模型的学习效率。在基于深度Q网络(deep Q-network,DQN)的视觉推动抓取(visual pushing for grasping,VPG)模型的基础上,提出了一种可供性方案用于简化机器人动作规划空间的搜索复杂度,加快机器人抓取的学习进程。通过减少在任何给定情况下可用的行动数量来实现更快的计划,有助于从数据中更高效和精确地学习模型。最后通过在V-rep仿真平台上的仿真场景验证了所提方法的有效性。