We study the two-dimensional traffic of cellular automata using computer simulation. We propose two type of decentralized cooperation strategies, which are called stepping aside (CS-SA) and choosing alternative routes...We study the two-dimensional traffic of cellular automata using computer simulation. We propose two type of decentralized cooperation strategies, which are called stepping aside (CS-SA) and choosing alternative routes (CS-CAR) respectively. We introduce them into an existing two-dimensional cellular automata (CA) model. CS-SA is designed to prohibit a kind of ping-pong jump when two objects standing together try to move in opposite directions. CS-CAR is designed to change the solution of conflict in parallel update. CS-CAR encourages the objects involved in parallel conflicts choose their alternative routes instead of waiting. We also combine the two cooperation strategies (CS-SA-CAR) to test their combined effects. It is found that the system keeps on a partial jam phase with nonzero velocity and flow until the density reaches one. The ratios of the ping-pong jump and the waiting objects involved in conflict are decreased obviously, especially at the free phase. And the average flow is improved by the three cooperation strategies. Although the average travel time is lengthened a bit by CS-CAR, it is shorten by CS-SA and CS-SA-CAR. In addition, we discuss the advantage and applicability of decentralized cooperation modeling.展开更多
Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising techn...Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising technique for learning cooperation,most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement.This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms.This method reshapes each UAV’s reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors.And the dependence between UAVs can be directly captured by the Pointwise Mutual Information(PMI)neural network without complicated aggregation statistics.Then,the experience sharing Reciprocal Reward Multi-Agent Actor-Critic(MAAC-R)algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs.Experiments demonstrate that the proposed algorithm can improve the UAVs’cooperation more effectively than the baseline algorithms,and can stimulate a rich form of cooperative tracking behaviors of UAV swarms.Besides,the learned policy can better scale to other scenarios with more UAVs and targets.展开更多
基金Supported by the National Natural Science Foundation of China under Grant No. 61103093the National High-Tech Research and Development Plan of China (863) under Grant No. 2011AA010502
文摘We study the two-dimensional traffic of cellular automata using computer simulation. We propose two type of decentralized cooperation strategies, which are called stepping aside (CS-SA) and choosing alternative routes (CS-CAR) respectively. We introduce them into an existing two-dimensional cellular automata (CA) model. CS-SA is designed to prohibit a kind of ping-pong jump when two objects standing together try to move in opposite directions. CS-CAR is designed to change the solution of conflict in parallel update. CS-CAR encourages the objects involved in parallel conflicts choose their alternative routes instead of waiting. We also combine the two cooperation strategies (CS-SA-CAR) to test their combined effects. It is found that the system keeps on a partial jam phase with nonzero velocity and flow until the density reaches one. The ratios of the ping-pong jump and the waiting objects involved in conflict are decreased obviously, especially at the free phase. And the average flow is improved by the three cooperation strategies. Although the average travel time is lengthened a bit by CS-CAR, it is shorten by CS-SA and CS-SA-CAR. In addition, we discuss the advantage and applicability of decentralized cooperation modeling.
基金funded by the Science and Technology Innovation 2030-Key Project of“New Generation Artificial Intelligence”,China(No.2020AAA0108200)the National Natural Science Foundation of China(No.61906209)。
文摘Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising technique for learning cooperation,most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement.This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms.This method reshapes each UAV’s reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors.And the dependence between UAVs can be directly captured by the Pointwise Mutual Information(PMI)neural network without complicated aggregation statistics.Then,the experience sharing Reciprocal Reward Multi-Agent Actor-Critic(MAAC-R)algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs.Experiments demonstrate that the proposed algorithm can improve the UAVs’cooperation more effectively than the baseline algorithms,and can stimulate a rich form of cooperative tracking behaviors of UAV swarms.Besides,the learned policy can better scale to other scenarios with more UAVs and targets.