A real-time adaptive roles allocation method based on reinforcement learning is proposed to improve humanrobot cooperation performance for a curtain wall installation task.This method breaks the traditional idea that ...A real-time adaptive roles allocation method based on reinforcement learning is proposed to improve humanrobot cooperation performance for a curtain wall installation task.This method breaks the traditional idea that the robot is regarded as the follower or only adjusts the leader and the follower in cooperation.In this paper,a self-learning method is proposed which can dynamically adapt and continuously adjust the initiative weight of the robot according to the change of the task.Firstly,the physical human-robot cooperation model,including the role factor is built.Then,a reinforcement learningmodel that can adjust the role factor in real time is established,and a reward and actionmodel is designed.The role factor can be adjusted continuously according to the comprehensive performance of the human-robot interaction force and the robot’s Jerk during the repeated installation.Finally,the roles adjustment rule established above continuously improves the comprehensive performance.Experiments of the dynamic roles allocation and the effect of the performance weighting coefficient on the result have been verified.The results show that the proposed method can realize the role adaptation and achieve the dual optimization goal of reducing the sum of the cooperator force and the robot’s Jerk.展开更多
This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight...This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.展开更多
In college badminton teaching,teachers utilize the group cooperative learning method,which not only helps to improve students’badminton skill level but also cultivates their teamwork spirit,communication skills,and s...In college badminton teaching,teachers utilize the group cooperative learning method,which not only helps to improve students’badminton skill level but also cultivates their teamwork spirit,communication skills,and self-management ability unconsciously.In view of this,this paper mainly describes the significance of applying the group cooperative learning method in college badminton teaching,analyzes the current problems in college badminton teaching,and aims to discover effective development strategies for group cooperative learning method in college badminton teaching in order to improve the effectiveness of college badminton teaching.展开更多
This study investigates the application of the teaching model combining cooperative learning and flipped classrooms in university basketball courses in China.By analyzing the advantages and disadvantages of the tradit...This study investigates the application of the teaching model combining cooperative learning and flipped classrooms in university basketball courses in China.By analyzing the advantages and disadvantages of the traditional basketball teaching model and students’satisfaction with the course,the necessity of implementing cooperative learning and flipped classrooms is proposed.The study planned in detail the implementation strategies before class,in the classroom,and after class,and compared them with the control group through an experimental design.The experimental results showed that the new teaching mode demonstrated significant advantages in terms of learning outcomes,student satisfaction,and teacher evaluation.This study provides a valuable reference for the future reform of the physical education curriculum.展开更多
Due to the development of digital transformation,intelligent algorithms are getting more and more attention.The whale optimization algorithm(WOA)is one of swarm intelligence optimization algorithms and is widely used ...Due to the development of digital transformation,intelligent algorithms are getting more and more attention.The whale optimization algorithm(WOA)is one of swarm intelligence optimization algorithms and is widely used to solve practical engineering optimization problems.However,with the increased dimensions,higher requirements are put forward for algorithm performance.The double population whale optimization algorithm with distributed collaboration and reverse learning ability(DCRWOA)is proposed to solve the slow convergence speed and unstable search accuracy of the WOA algorithm in optimization problems.In the DCRWOA algorithm,the novel double population search strategy is constructed.Meanwhile,the reverse learning strategy is adopted in the population search process to help individuals quickly jump out of the non-ideal search area.Numerical experi-ments are carried out using standard test functions with different dimensions(10,50,100,200).The optimization case of shield construction parameters is also used to test the practical application performance of the proposed algo-rithm.The results show that the DCRWOA algorithm has higher optimization accuracy and stability,and the convergence speed is significantly improved.Therefore,the proposed DCRWOA algorithm provides a better method for solving practical optimization problems.展开更多
Motion planning is critical to realize the autonomous operation of mobile robots.As the complexity and randomness of robot application scenarios increase,the planning capability of the classical hierarchical motion pl...Motion planning is critical to realize the autonomous operation of mobile robots.As the complexity and randomness of robot application scenarios increase,the planning capability of the classical hierarchical motion planners is challenged.With the development of machine learning,the deep reinforcement learning(DRL)-based motion planner has gradually become a research hotspot due to its several advantageous feature.The DRL-based motion planner is model-free and does not rely on the prior structured map.Most importantly,the DRL-based motion planner achieves the unification of the global planner and the local planner.In this paper,we provide a systematic review of various motion planning methods.Firstly,we summarize the representative and state-of-the-art works for each submodule of the classical motion planning architecture and analyze their performance features.Then,we concentrate on summarizing reinforcement learning(RL)-based motion planning approaches,including motion planners combined with RL improvements,map-free RL-based motion planners,and multi-robot cooperative planning methods.Finally,we analyze the urgent challenges faced by these mainstream RLbased motion planners in detail,review some state-of-the-art works for these issues,and propose suggestions for future research.展开更多
We studied the effect of population density in a spatial public goods game.We found that the effect on the evolution of cooperation is very complex when the strategy learning and mobility of players in a long range ar...We studied the effect of population density in a spatial public goods game.We found that the effect on the evolution of cooperation is very complex when the strategy learning and mobility of players in a long range are considered in a two-dimensional lattice.As the learning range is larger than the mobility range,the system is driven to enter into a cooperation state for a low population density,because a small local group is beneficial to sustain a high level of cooperation.As population density increases to a moderate range,the mobility of players from a domain invaded by defectors supports the evolution stability of cooperation.When the mobility range is larger than the learning range,a formation of compact domains of cooperators promotes cooperation as the population density becomes high.展开更多
Integrating the blockchain technology into mobile-edge computing(MEC)networks with multiple cooperative MEC servers(MECS)providing a promising solution to improving resource utilization,and helping establish a secure ...Integrating the blockchain technology into mobile-edge computing(MEC)networks with multiple cooperative MEC servers(MECS)providing a promising solution to improving resource utilization,and helping establish a secure reward mechanism that can facilitate load balancing among MECS.In addition,intelligent management of service caching and load balancing can improve the network utility in MEC blockchain networks with multiple types of workloads.In this paper,we investigate a learningbased joint service caching and load balancing policy for optimizing the communication and computation resources allocation,so as to improve the resource utilization of MEC blockchain networks.We formulate the problem as a challenging long-term network revenue maximization Markov decision process(MDP)problem.To address the highly dynamic and high dimension of system states,we design a joint service caching and load balancing algorithm based on the double-dueling Deep Q network(DQN)approach.The simulation results validate the feasibility and superior performance of our proposed algorithm over several baseline schemes.展开更多
As 5G becomes commercial,researchers have turned attention toward the Sixth-Generation(6G)network with the vision of connecting intelligence in a green energy-efficient manner.Federated learning triggers an upsurge of...As 5G becomes commercial,researchers have turned attention toward the Sixth-Generation(6G)network with the vision of connecting intelligence in a green energy-efficient manner.Federated learning triggers an upsurge of green intelligent services such as resources orchestration of communication infrastructures while preserving privacy and increasing communication efficiency.However,designing effective incentives in federated learning is challenging due to the dynamic available clients and the correlation between clients'contributions during the learning process.In this paper,we propose a dynamic incentive and reputation mechanism to improve energy efficiency and training performance of federated learning.The proposed incentive based on the Stackelberg game can timely adjust optimal energy consumption with changes in available clients during federated learning.Meanwhile,clients’contributions in reputation management are formulated based on the cooperative game to capture the correlation between tasks,which satisfies availability,fairness,and additivity.The simulation results show that the proposed scheme can significantly motivate high-performance clients to participate in federated learning and improve the accuracy and energy efficiency of the federated learning model.展开更多
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit...To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified.展开更多
基金The research has been generously supported by Tianjin Education Commission Scientific Research Program(2020KJ056),ChinaTianjin Science and Technology Planning Project(22YDTPJC00970),China.The authors would like to express their sincere appreciation for all support provided.
文摘A real-time adaptive roles allocation method based on reinforcement learning is proposed to improve humanrobot cooperation performance for a curtain wall installation task.This method breaks the traditional idea that the robot is regarded as the follower or only adjusts the leader and the follower in cooperation.In this paper,a self-learning method is proposed which can dynamically adapt and continuously adjust the initiative weight of the robot according to the change of the task.Firstly,the physical human-robot cooperation model,including the role factor is built.Then,a reinforcement learningmodel that can adjust the role factor in real time is established,and a reward and actionmodel is designed.The role factor can be adjusted continuously according to the comprehensive performance of the human-robot interaction force and the robot’s Jerk during the repeated installation.Finally,the roles adjustment rule established above continuously improves the comprehensive performance.Experiments of the dynamic roles allocation and the effect of the performance weighting coefficient on the result have been verified.The results show that the proposed method can realize the role adaptation and achieve the dual optimization goal of reducing the sum of the cooperator force and the robot’s Jerk.
基金supported by the National Science and Technology Major Project (2021ZD0112702)the National Natural Science Foundation (NNSF)of China (62373100,62233003)the Natural Science Foundation of Jiangsu Province of China (BK20202006)。
文摘This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.
文摘In college badminton teaching,teachers utilize the group cooperative learning method,which not only helps to improve students’badminton skill level but also cultivates their teamwork spirit,communication skills,and self-management ability unconsciously.In view of this,this paper mainly describes the significance of applying the group cooperative learning method in college badminton teaching,analyzes the current problems in college badminton teaching,and aims to discover effective development strategies for group cooperative learning method in college badminton teaching in order to improve the effectiveness of college badminton teaching.
文摘This study investigates the application of the teaching model combining cooperative learning and flipped classrooms in university basketball courses in China.By analyzing the advantages and disadvantages of the traditional basketball teaching model and students’satisfaction with the course,the necessity of implementing cooperative learning and flipped classrooms is proposed.The study planned in detail the implementation strategies before class,in the classroom,and after class,and compared them with the control group through an experimental design.The experimental results showed that the new teaching mode demonstrated significant advantages in terms of learning outcomes,student satisfaction,and teacher evaluation.This study provides a valuable reference for the future reform of the physical education curriculum.
基金supported by Anhui Polytechnic University Introduced Talents Research Fund(No.2021YQQ064)Anhui Polytechnic University ScientificResearch Project(No.Xjky2022168).
文摘Due to the development of digital transformation,intelligent algorithms are getting more and more attention.The whale optimization algorithm(WOA)is one of swarm intelligence optimization algorithms and is widely used to solve practical engineering optimization problems.However,with the increased dimensions,higher requirements are put forward for algorithm performance.The double population whale optimization algorithm with distributed collaboration and reverse learning ability(DCRWOA)is proposed to solve the slow convergence speed and unstable search accuracy of the WOA algorithm in optimization problems.In the DCRWOA algorithm,the novel double population search strategy is constructed.Meanwhile,the reverse learning strategy is adopted in the population search process to help individuals quickly jump out of the non-ideal search area.Numerical experi-ments are carried out using standard test functions with different dimensions(10,50,100,200).The optimization case of shield construction parameters is also used to test the practical application performance of the proposed algo-rithm.The results show that the DCRWOA algorithm has higher optimization accuracy and stability,and the convergence speed is significantly improved.Therefore,the proposed DCRWOA algorithm provides a better method for solving practical optimization problems.
基金supported by the National Natural Science Foundation of China (62173251)the“Zhishan”Scholars Programs of Southeast University+1 种基金the Fundamental Research Funds for the Central UniversitiesShanghai Gaofeng&Gaoyuan Project for University Academic Program Development (22120210022)
文摘Motion planning is critical to realize the autonomous operation of mobile robots.As the complexity and randomness of robot application scenarios increase,the planning capability of the classical hierarchical motion planners is challenged.With the development of machine learning,the deep reinforcement learning(DRL)-based motion planner has gradually become a research hotspot due to its several advantageous feature.The DRL-based motion planner is model-free and does not rely on the prior structured map.Most importantly,the DRL-based motion planner achieves the unification of the global planner and the local planner.In this paper,we provide a systematic review of various motion planning methods.Firstly,we summarize the representative and state-of-the-art works for each submodule of the classical motion planning architecture and analyze their performance features.Then,we concentrate on summarizing reinforcement learning(RL)-based motion planning approaches,including motion planners combined with RL improvements,map-free RL-based motion planners,and multi-robot cooperative planning methods.Finally,we analyze the urgent challenges faced by these mainstream RLbased motion planners in detail,review some state-of-the-art works for these issues,and propose suggestions for future research.
基金Supported by the National Natural Science Foundation of China under Grant No 10575055the K.C.Wong Magna Fund in Ningbo University,and the Super-computer Center of Ningbo University.
文摘We studied the effect of population density in a spatial public goods game.We found that the effect on the evolution of cooperation is very complex when the strategy learning and mobility of players in a long range are considered in a two-dimensional lattice.As the learning range is larger than the mobility range,the system is driven to enter into a cooperation state for a low population density,because a small local group is beneficial to sustain a high level of cooperation.As population density increases to a moderate range,the mobility of players from a domain invaded by defectors supports the evolution stability of cooperation.When the mobility range is larger than the learning range,a formation of compact domains of cooperators promotes cooperation as the population density becomes high.
基金supported in part by the National Natural Science Foundation of China 62072096the Fundamental Research Funds for the Central Universities under Grant 2232020A-12+4 种基金the International S&T Cooperation Program of Shanghai Science and Technology Commission under Grant 20220713000the Young Top-notch Talent Program in Shanghaithe"Shuguang Program"of Shanghai Education Development Foundation and Shanghai Municipal Education Commissionthe Fundamental Research Funds for the Central Universities and Graduate Student Innovation Fund of Donghua University CUSF-DH-D-2019093supported in part by the NSF under grants CNS-2107190 and ECCS-1923717。
文摘Integrating the blockchain technology into mobile-edge computing(MEC)networks with multiple cooperative MEC servers(MECS)providing a promising solution to improving resource utilization,and helping establish a secure reward mechanism that can facilitate load balancing among MECS.In addition,intelligent management of service caching and load balancing can improve the network utility in MEC blockchain networks with multiple types of workloads.In this paper,we investigate a learningbased joint service caching and load balancing policy for optimizing the communication and computation resources allocation,so as to improve the resource utilization of MEC blockchain networks.We formulate the problem as a challenging long-term network revenue maximization Markov decision process(MDP)problem.To address the highly dynamic and high dimension of system states,we design a joint service caching and load balancing algorithm based on the double-dueling Deep Q network(DQN)approach.The simulation results validate the feasibility and superior performance of our proposed algorithm over several baseline schemes.
文摘As 5G becomes commercial,researchers have turned attention toward the Sixth-Generation(6G)network with the vision of connecting intelligence in a green energy-efficient manner.Federated learning triggers an upsurge of green intelligent services such as resources orchestration of communication infrastructures while preserving privacy and increasing communication efficiency.However,designing effective incentives in federated learning is challenging due to the dynamic available clients and the correlation between clients'contributions during the learning process.In this paper,we propose a dynamic incentive and reputation mechanism to improve energy efficiency and training performance of federated learning.The proposed incentive based on the Stackelberg game can timely adjust optimal energy consumption with changes in available clients during federated learning.Meanwhile,clients’contributions in reputation management are formulated based on the cooperative game to capture the correlation between tasks,which satisfies availability,fairness,and additivity.The simulation results show that the proposed scheme can significantly motivate high-performance clients to participate in federated learning and improve the accuracy and energy efficiency of the federated learning model.
基金financial support from National Natural Science Foundation of China(Grant No.61601491)Natural Science Foundation of Hubei Province,China(Grant No.2018CFC865)Military Research Project of China(-Grant No.YJ2020B117)。
文摘To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified.