Improving the intelligence of virtual entities is an important issue in Computer Generated Forces (CGFs) construction. Some traditional approaches try to achieve this by specifying how entities should react to prede...Improving the intelligence of virtual entities is an important issue in Computer Generated Forces (CGFs) construction. Some traditional approaches try to achieve this by specifying how entities should react to predefined conditions, which is not suitable for complex and dynamic environments. This paper aims to apply Monte Carlo Tree Search (MCTS) for the behavior modeling of CGF commander. By look-ahead reasoning, the model generates adaptive decisions to direct the whole troops to fight. Our main work is to formulate the tree model through the state and action abstraction, and extend its expansion process to handle simultaneous and durative moves. We also employ Hierarchical Task Network (HTN) planning to guide the search, thus enhancing the search efficiency. The final implementation is tested in an infantry combat simulation where a company commander needs to control three platoons to assault and clear enemies within defined areas. Comparative results from a series of experiments demonstrate that the HTN guided MCTS commander can outperform other commanders following fixed strategies.展开更多
A framework that integrates planning,monitoring and replanning techniques is proposed.It can devise the best solution based on the current state according to specific objectives and properly deal with the influence of...A framework that integrates planning,monitoring and replanning techniques is proposed.It can devise the best solution based on the current state according to specific objectives and properly deal with the influence of abnormity on the plan execution.The framework consists of three parts:the hierarchical task network(HTN)planner based on Monte Carlo tree search(MCTS),hybrid plan monitoring based on forward and backward and norm-based replanning method selection.The HTN planner based on MCTS selects the optimal method for HTN compound task through pre-exploration.Based on specific objectives,it can identify the best solution to the current problem.The hybrid plan monitoring has the capability to detect the influence of abnormity on the effect of an executed action and the premise of an unexecuted action,thus trigger the replanning.The norm-based replanning selection method can measure the difference between the expected state and the actual state,and then select the best replanning algorithm.The experimental results reveal that our method can effectively deal with the influence of abnormity on the implementation of the plan and achieve the target task in an optimal way.展开更多
Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games,e.g.,StarCraft and poker.Neural Fictitious Self-Play(NFSP)is an effective algorithm that lea...Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games,e.g.,StarCraft and poker.Neural Fictitious Self-Play(NFSP)is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge.However,it needs to train a neural network in an off-policy manner to approximate the action values.For games with large search spaces,the training may suffer from unnecessary exploration and sometimes fails to converge.In this paper,we propose a new Neural Fictitious Self-Play algorithm that combines Monte Carlo tree search with NFSP,called MC-NFSP,to improve the performance in real-time zero-sum imperfect-information games.With experiments and empirical analysis,we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not.Furthermore,we develop an Asynchronous Neural Fictitious Self-Play framework(ANFSP).It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality.The experiments with th e games with hidden state information(Texas Hold^m),and the FPS(firstperson shooter)games demonstrate effectiveness of our algorithms.展开更多
Unmanned Aerial Vehicle(UAV)has emerged as a promising technology for the support of human activities,such as target tracking,disaster rescue,and surveillance.However,these tasks require a large computation load of im...Unmanned Aerial Vehicle(UAV)has emerged as a promising technology for the support of human activities,such as target tracking,disaster rescue,and surveillance.However,these tasks require a large computation load of image or video processing,which imposes enormous pressure on the UAV computation platform.To solve this issue,in this work,we propose an intelligent Task Offloading Algorithm(iTOA)for UAV edge computing network.Compared with existing methods,iTOA is able to perceive the network’s environment intelligently to decide the offloading action based on deep Monte Calor Tree Search(MCTS),the core algorithm of Alpha Go.MCTS will simulate the offloading decision trajectories to acquire the best decision by maximizing the reward,such as lowest latency or power consumption.To accelerate the search convergence of MCTS,we also proposed a splitting Deep Neural Network(sDNN)to supply the prior probability for MCTS.The sDNN is trained by a self-supervised learning manager.Here,the training data set is obtained from iTOA itself as its own teacher.Compared with game theory and greedy search-based methods,the proposed iTOA improves service latency performance by 33%and 60%,respectively.展开更多
With the complexity of the composition process and the rapid growth of candidate services,realizing optimal or near-optimal service composition is an urgent problem.Currently,the static service composition chain is ri...With the complexity of the composition process and the rapid growth of candidate services,realizing optimal or near-optimal service composition is an urgent problem.Currently,the static service composition chain is rigid and cannot be easily adapted to the dynamic Web environment.To address these challenges,the geographic information service composition(GISC) problem as a sequential decision-making task is modeled.In addition,the Markov decision process(MDP),as a universal model for the planning problem of agents,is used to describe the GISC problem.Then,to achieve self-adaptivity and optimization in a dynamic environment,a novel approach that integrates Monte Carlo tree search(MCTS) and a temporal-difference(TD) learning algorithm is proposed.The concrete services of abstract services are determined with optimal policies and adaptive capability at runtime,based on the environment and the status of component services.The simulation experiment is performed to demonstrate the effectiveness and efficiency through learning quality and performance.展开更多
Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampl...Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampling algorithms that provably increase the AI potential.Design/methodology/approach-In the current paper the authors set up a mathematical framework,state and prove a version of a Geiringer-like theorem that is very well-suited for the development of Mote-Carlo sampling algorithms to cope with randomness and incomplete information to make decisions.Findings-This work establishes an important theoretical link between classical population genetics,evolutionary computation theory and model free reinforcement learning methodology.Not only may the theory explain the success of the currently existing Monte-Carlo tree sampling methodology,but it also leads to the development of novel Monte-Carlo sampling techniques guided by rigorous mathematical foundation.Practical implications-The theoretical foundations established in the current work provide guidance for the design of powerful Monte-Carlo sampling algorithms in model free reinforcement learning,to tackle numerous problems in computational intelligence.Originality/value-Establishing a Geiringer-like theorem with non-homologous recombination was a long-standing open problem in evolutionary computation theory.Apart from overcoming this challenge,in a mathematically elegant fashion and establishing a rather general and powerful version of the theorem,this work leads directly to the development of novel provably powerful algorithms for decision making in the environment involving randomness,hidden or incomplete information.展开更多
文摘Improving the intelligence of virtual entities is an important issue in Computer Generated Forces (CGFs) construction. Some traditional approaches try to achieve this by specifying how entities should react to predefined conditions, which is not suitable for complex and dynamic environments. This paper aims to apply Monte Carlo Tree Search (MCTS) for the behavior modeling of CGF commander. By look-ahead reasoning, the model generates adaptive decisions to direct the whole troops to fight. Our main work is to formulate the tree model through the state and action abstraction, and extend its expansion process to handle simultaneous and durative moves. We also employ Hierarchical Task Network (HTN) planning to guide the search, thus enhancing the search efficiency. The final implementation is tested in an infantry combat simulation where a company commander needs to control three platoons to assault and clear enemies within defined areas. Comparative results from a series of experiments demonstrate that the HTN guided MCTS commander can outperform other commanders following fixed strategies.
基金supported by the National Natural Science Foundation of China(61806221).
文摘A framework that integrates planning,monitoring and replanning techniques is proposed.It can devise the best solution based on the current state according to specific objectives and properly deal with the influence of abnormity on the plan execution.The framework consists of three parts:the hierarchical task network(HTN)planner based on Monte Carlo tree search(MCTS),hybrid plan monitoring based on forward and backward and norm-based replanning method selection.The HTN planner based on MCTS selects the optimal method for HTN compound task through pre-exploration.Based on specific objectives,it can identify the best solution to the current problem.The hybrid plan monitoring has the capability to detect the influence of abnormity on the effect of an executed action and the premise of an unexecuted action,thus trigger the replanning.The norm-based replanning selection method can measure the difference between the expected state and the actual state,and then select the best replanning algorithm.The experimental results reveal that our method can effectively deal with the influence of abnormity on the implementation of the plan and achieve the target task in an optimal way.
基金National Key Research and Development Program of China(2017YFB1002503)Science and Technology Innovation 2030-“New Generation Artificial Intelligence”Major Project(2018AAA0100902),China.
文摘Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games,e.g.,StarCraft and poker.Neural Fictitious Self-Play(NFSP)is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge.However,it needs to train a neural network in an off-policy manner to approximate the action values.For games with large search spaces,the training may suffer from unnecessary exploration and sometimes fails to converge.In this paper,we propose a new Neural Fictitious Self-Play algorithm that combines Monte Carlo tree search with NFSP,called MC-NFSP,to improve the performance in real-time zero-sum imperfect-information games.With experiments and empirical analysis,we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not.Furthermore,we develop an Asynchronous Neural Fictitious Self-Play framework(ANFSP).It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality.The experiments with th e games with hidden state information(Texas Hold^m),and the FPS(firstperson shooter)games demonstrate effectiveness of our algorithms.
基金the Artificial Intelligence Key Laboratory of Sichuan Province(Nos.2019RYJ05)National Natural Science Foundation of China(Nos.61971107).
文摘Unmanned Aerial Vehicle(UAV)has emerged as a promising technology for the support of human activities,such as target tracking,disaster rescue,and surveillance.However,these tasks require a large computation load of image or video processing,which imposes enormous pressure on the UAV computation platform.To solve this issue,in this work,we propose an intelligent Task Offloading Algorithm(iTOA)for UAV edge computing network.Compared with existing methods,iTOA is able to perceive the network’s environment intelligently to decide the offloading action based on deep Monte Calor Tree Search(MCTS),the core algorithm of Alpha Go.MCTS will simulate the offloading decision trajectories to acquire the best decision by maximizing the reward,such as lowest latency or power consumption.To accelerate the search convergence of MCTS,we also proposed a splitting Deep Neural Network(sDNN)to supply the prior probability for MCTS.The sDNN is trained by a self-supervised learning manager.Here,the training data set is obtained from iTOA itself as its own teacher.Compared with game theory and greedy search-based methods,the proposed iTOA improves service latency performance by 33%and 60%,respectively.
基金Supported by the National Natural Science Foundation of China(No.41971356,41671400,41701446)National Key Research and Development Program of China(No.2017YFB0503600,2018YFB0505500)Hubei Province Natural Science Foundation of China(No.2017CFB277)。
文摘With the complexity of the composition process and the rapid growth of candidate services,realizing optimal or near-optimal service composition is an urgent problem.Currently,the static service composition chain is rigid and cannot be easily adapted to the dynamic Web environment.To address these challenges,the geographic information service composition(GISC) problem as a sequential decision-making task is modeled.In addition,the Markov decision process(MDP),as a universal model for the planning problem of agents,is used to describe the GISC problem.Then,to achieve self-adaptivity and optimization in a dynamic environment,a novel approach that integrates Monte Carlo tree search(MCTS) and a temporal-difference(TD) learning algorithm is proposed.The concrete services of abstract services are determined with optimal policies and adaptive capability at runtime,based on the environment and the status of component services.The simulation experiment is performed to demonstrate the effectiveness and efficiency through learning quality and performance.
基金This work has been sponsored by EPSRC EP/D003/05/1“Amorphous Computing”and EPSRC EP/I009809/1“Evolutionary Approximation Algorithms for Optimization:Algorithm Design and Complexity Analysis”Grants.
文摘Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampling algorithms that provably increase the AI potential.Design/methodology/approach-In the current paper the authors set up a mathematical framework,state and prove a version of a Geiringer-like theorem that is very well-suited for the development of Mote-Carlo sampling algorithms to cope with randomness and incomplete information to make decisions.Findings-This work establishes an important theoretical link between classical population genetics,evolutionary computation theory and model free reinforcement learning methodology.Not only may the theory explain the success of the currently existing Monte-Carlo tree sampling methodology,but it also leads to the development of novel Monte-Carlo sampling techniques guided by rigorous mathematical foundation.Practical implications-The theoretical foundations established in the current work provide guidance for the design of powerful Monte-Carlo sampling algorithms in model free reinforcement learning,to tackle numerous problems in computational intelligence.Originality/value-Establishing a Geiringer-like theorem with non-homologous recombination was a long-standing open problem in evolutionary computation theory.Apart from overcoming this challenge,in a mathematically elegant fashion and establishing a rather general and powerful version of the theorem,this work leads directly to the development of novel provably powerful algorithms for decision making in the environment involving randomness,hidden or incomplete information.