期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
An Empirical Study on Google Research Football Multi-agent Scenarios
1
作者 Yan Song He Jiang +6 位作者 Zheng Tian Haifeng Zhang Yingping Zhang jiangcheng zhu Zonghong Dai Weinan Zhang Jun Wang 《Machine Intelligence Research》 EI CSCD 2024年第3期549-570,共22页
Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has ... Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of independent proximal policy optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we release our training framework Light-MALib which extends the MALib codebase by distributed and asynchronous implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football. 展开更多
关键词 Multi-agent reinforcement learning(RL) distributed RL system population-based training reward shaping game theory
原文传递
Model Error Correction in Data Assimilation by Integrating Neural Networks 被引量:2
2
作者 jiangcheng zhu Shuang Hu +3 位作者 Rossella Arcucci Chao Xu Jihong zhu Yi-ke Guo 《Big Data Mining and Analytics》 2019年第2期83-91,共9页
In this paper, we suggest a new methodology which combines Neural Networks(NN) into Data Assimilation(DA). Focusing on the structural model uncertainty, we propose a framework for integration NN with the physical mode... In this paper, we suggest a new methodology which combines Neural Networks(NN) into Data Assimilation(DA). Focusing on the structural model uncertainty, we propose a framework for integration NN with the physical models by DA algorithms, to improve both the assimilation process and the forecasting results. The NNs are iteratively trained as observational data is updated. The main DA models used here are the Kalman filter and the variational approaches. The effectiveness of the proposed algorithm is validated by examples and by a sensitivity study. 展开更多
关键词 data ASSIMILATION deep learning NEURAL networks KALMAN filter VARIATIONAL approach
原文传递
Time-in-action RL
3
作者 jiangcheng zhu Zhepei Wang +3 位作者 Douglas Mcilwraith Chao Wu Chao Xu Yike Guo 《IET Cyber-Systems and Robotics》 EI 2019年第1期28-37,共10页
The authors propose a novel reinforcement learning(RL)framework,where agent behaviour is governed by traditional control theory.This integrated approach,called time-in-action RL,enables RL to be applicable to many rea... The authors propose a novel reinforcement learning(RL)framework,where agent behaviour is governed by traditional control theory.This integrated approach,called time-in-action RL,enables RL to be applicable to many real-world systems,where underlying dynamics are known in their control theoretical formalism.The key insight to facilitate this integration is to model the explicit time function,mapping the state-action pair to the time accomplishing the action by its underlying controller.In their framework,they describe an action by its value(action value),and the time that it takes to perform(action time).An action-value results from the policy of RL regarding a state.Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller.RL value network is then trained with embedded time model to predict action time.This approach is tested using a variant of Atari Pong and proved to be convergent. 展开更多
关键词 action POLICY AGENT
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部