Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has ...Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of independent proximal policy optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we release our training framework Light-MALib which extends the MALib codebase by distributed and asynchronous implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football.展开更多
In this paper, we suggest a new methodology which combines Neural Networks(NN) into Data Assimilation(DA). Focusing on the structural model uncertainty, we propose a framework for integration NN with the physical mode...In this paper, we suggest a new methodology which combines Neural Networks(NN) into Data Assimilation(DA). Focusing on the structural model uncertainty, we propose a framework for integration NN with the physical models by DA algorithms, to improve both the assimilation process and the forecasting results. The NNs are iteratively trained as observational data is updated. The main DA models used here are the Kalman filter and the variational approaches. The effectiveness of the proposed algorithm is validated by examples and by a sensitivity study.展开更多
The authors propose a novel reinforcement learning(RL)framework,where agent behaviour is governed by traditional control theory.This integrated approach,called time-in-action RL,enables RL to be applicable to many rea...The authors propose a novel reinforcement learning(RL)framework,where agent behaviour is governed by traditional control theory.This integrated approach,called time-in-action RL,enables RL to be applicable to many real-world systems,where underlying dynamics are known in their control theoretical formalism.The key insight to facilitate this integration is to model the explicit time function,mapping the state-action pair to the time accomplishing the action by its underlying controller.In their framework,they describe an action by its value(action value),and the time that it takes to perform(action time).An action-value results from the policy of RL regarding a state.Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller.RL value network is then trained with embedded time model to predict action time.This approach is tested using a variant of Atari Pong and proved to be convergent.展开更多
基金supported by the National Natural Science Foundation of China(No.62206289).
文摘Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of independent proximal policy optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we release our training framework Light-MALib which extends the MALib codebase by distributed and asynchronous implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football.
基金supported by the EPSRC Grand Challenge grant "Managing Air for Green Inner Cities" (MAGIC) EP/N010221/1
文摘In this paper, we suggest a new methodology which combines Neural Networks(NN) into Data Assimilation(DA). Focusing on the structural model uncertainty, we propose a framework for integration NN with the physical models by DA algorithms, to improve both the assimilation process and the forecasting results. The NNs are iteratively trained as observational data is updated. The main DA models used here are the Kalman filter and the variational approaches. The effectiveness of the proposed algorithm is validated by examples and by a sensitivity study.
文摘The authors propose a novel reinforcement learning(RL)framework,where agent behaviour is governed by traditional control theory.This integrated approach,called time-in-action RL,enables RL to be applicable to many real-world systems,where underlying dynamics are known in their control theoretical formalism.The key insight to facilitate this integration is to model the explicit time function,mapping the state-action pair to the time accomplishing the action by its underlying controller.In their framework,they describe an action by its value(action value),and the time that it takes to perform(action time).An action-value results from the policy of RL regarding a state.Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller.RL value network is then trained with embedded time model to predict action time.This approach is tested using a variant of Atari Pong and proved to be convergent.