In this paper,Joseph,kind to his brothers whom had once nearly killed him before will be discussed.Years of terrible suffering when Joseph has been sold to Egypt as a slave,from the bottom towards the top,he finally b...In this paper,Joseph,kind to his brothers whom had once nearly killed him before will be discussed.Years of terrible suffering when Joseph has been sold to Egypt as a slave,from the bottom towards the top,he finally became the prime minister to help the Egyptian pharaohs with seven years of abundant and famine.When his brothers came to Egypt to buy coin to get alive,as a prime minister,Joseph did not use his power to punish them but forgive all after identifying their repentance.Forgiving all and asking his brothers to reunite the whole family and relatives,Joseph also settles his father and brothers in the best of the land,dwell in the land of Goshen admitted by Egyp tian pharaohs.Conclusion:Every one of us shall be as kind as Joseph,forget the evil things treated to us before,and forgive the persons,who betrayed or framed before:a disposition to be lenient in pardoning others.展开更多
Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents chang...Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.展开更多
文摘In this paper,Joseph,kind to his brothers whom had once nearly killed him before will be discussed.Years of terrible suffering when Joseph has been sold to Egypt as a slave,from the bottom towards the top,he finally became the prime minister to help the Egyptian pharaohs with seven years of abundant and famine.When his brothers came to Egypt to buy coin to get alive,as a prime minister,Joseph did not use his power to punish them but forgive all after identifying their repentance.Forgiving all and asking his brothers to reunite the whole family and relatives,Joseph also settles his father and brothers in the best of the land,dwell in the land of Goshen admitted by Egyp tian pharaohs.Conclusion:Every one of us shall be as kind as Joseph,forget the evil things treated to us before,and forgive the persons,who betrayed or framed before:a disposition to be lenient in pardoning others.
基金The work was supported by the National Natural Science Foundation of China under Grant Nos.61702362,U1836214,and 61876119the Special Program of Artificial Intelligence of Tianjin Research Program of Application Foundation and Advanced Technology under Grant No.16JCQNJC00100+3 种基金the Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission of China under Grant No.56917ZXRGGX00150the Science and Technology Program of Tianjin of China under Grant Nos.15PTCYSY00030 and 16ZXHLGX00170the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20181432Acknowledgments We thank our industrial re search partner Netease, Inc., especially the Fuxi AILaboratory of Leihuo Business Groups for their discus sion and support with the experiments.
文摘Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.