By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning...By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.展开更多
Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a st...Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.展开更多
Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical ...Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan.展开更多
Deep stochastic configuration networks(DSCNs)produce redundant hidden nodes and connections during training,which complicates their model structures.Aiming at the above problems,this paper proposes a double pruning st...Deep stochastic configuration networks(DSCNs)produce redundant hidden nodes and connections during training,which complicates their model structures.Aiming at the above problems,this paper proposes a double pruning structure design algorithm for DSCNs based on mutual information and relevance.During the training process,the mutual information algorithm is used to calculate and sort the importance scores of the nodes in each hidden layer in a layer-by-layer manner,the node pruning rate of each layer is set according to the depth of the DSCN at the current time,the nodes that contribute little to the model are deleted,and the network-related parameters are updated.When the model completes the configuration procedure,the correlation evaluation strategy is used to sort the global connection weights and delete insignificance connections;then,the network parameters are updated after pruning is completed.The experimental results show that the proposed structure design method can effectively compress the scale of a DSCN model and improve its modeling speed;the model accuracy loss is small,and fine-tuning for accuracy restoration is not needed.The obtained DSCN model has certain application value in the field of regression analysis.展开更多
针对5G新空口-车联网(New Radio-Vehicle to Everything,NR-V2X)场景下车对基础设施(Vehicle to Infrastructure,V2I)和车对车(Vehicle to Vehicle,V2V)共享上行通信链路的频谱资源分配问题,提出了一种联邦-多智能体深度Q网络(Federated...针对5G新空口-车联网(New Radio-Vehicle to Everything,NR-V2X)场景下车对基础设施(Vehicle to Infrastructure,V2I)和车对车(Vehicle to Vehicle,V2V)共享上行通信链路的频谱资源分配问题,提出了一种联邦-多智能体深度Q网络(Federated Learning-Multi-Agent Deep Q Network,FL-MADQN)算法.该分布式算法中,每个车辆用户作为一个智能体,根据获取的本地信道状态信息,以网络信道容量最佳为目标函数,采用DQN算法训练学习本地网络模型.采用联邦学习加快以及稳定各智能体网络模型训练的收敛速度,即将各智能体的本地模型上传至基站进行聚合形成全局模型,再将全局模型下发至各智能体更新本地模型.仿真结果表明:与传统分布式多智能体DQN算法相比,所提出的方案具有更快的模型收敛速度,并且当车辆用户数增大时仍然保证V2V链路的通信效率以及V2I链路的信道容量.展开更多
基金funded by National Natural Science Foundation of China(No.62063006)Guangxi Science and Technology Major Program(No.2022AA05002)+1 种基金Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region(No.2022GXZDSY003)Central Leading Local Science and Technology Development Fund Project of Wuzhou(No.202201001).
文摘By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.
文摘Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.
基金supported by the National Key Research and Development Program of China(No.2021YFE0116900)National Natural Science Foundation of China(Nos.42275157,62002276,and 41975142)Major Program of the National Social Science Fund of China(No.17ZDA092).
文摘Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan.
基金supported by the National Natural Science Foundation of China(62073006)the Beijing Natural Science Foundation of China(4212032)
文摘Deep stochastic configuration networks(DSCNs)produce redundant hidden nodes and connections during training,which complicates their model structures.Aiming at the above problems,this paper proposes a double pruning structure design algorithm for DSCNs based on mutual information and relevance.During the training process,the mutual information algorithm is used to calculate and sort the importance scores of the nodes in each hidden layer in a layer-by-layer manner,the node pruning rate of each layer is set according to the depth of the DSCN at the current time,the nodes that contribute little to the model are deleted,and the network-related parameters are updated.When the model completes the configuration procedure,the correlation evaluation strategy is used to sort the global connection weights and delete insignificance connections;then,the network parameters are updated after pruning is completed.The experimental results show that the proposed structure design method can effectively compress the scale of a DSCN model and improve its modeling speed;the model accuracy loss is small,and fine-tuning for accuracy restoration is not needed.The obtained DSCN model has certain application value in the field of regression analysis.
文摘针对5G新空口-车联网(New Radio-Vehicle to Everything,NR-V2X)场景下车对基础设施(Vehicle to Infrastructure,V2I)和车对车(Vehicle to Vehicle,V2V)共享上行通信链路的频谱资源分配问题,提出了一种联邦-多智能体深度Q网络(Federated Learning-Multi-Agent Deep Q Network,FL-MADQN)算法.该分布式算法中,每个车辆用户作为一个智能体,根据获取的本地信道状态信息,以网络信道容量最佳为目标函数,采用DQN算法训练学习本地网络模型.采用联邦学习加快以及稳定各智能体网络模型训练的收敛速度,即将各智能体的本地模型上传至基站进行聚合形成全局模型,再将全局模型下发至各智能体更新本地模型.仿真结果表明:与传统分布式多智能体DQN算法相比,所提出的方案具有更快的模型收敛速度,并且当车辆用户数增大时仍然保证V2V链路的通信效率以及V2I链路的信道容量.