Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delay...Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delays, which is unable to ensure the integrity and timeliness of the information on decision making for satellites. And the optimization of the planning result is affected. Therefore, the effect of communication delay is considered during the multi-satel ite coordinating process. For this problem, firstly, a distributed cooperative optimization problem for multiple satellites in the delayed communication environment is formulized. Secondly, based on both the analysis of the temporal sequence of tasks in a single satellite and the dynamically decoupled characteristics of the multi-satellite system, the environment information of multi-satellite distributed cooperative optimization is constructed on the basis of the directed acyclic graph(DAG). Then, both a cooperative optimization decision making framework and a model are built according to the decentralized partial observable Markov decision process(DEC-POMDP). After that, a satellite coordinating strategy aimed at different conditions of communication delay is mainly analyzed, and a unified processing strategy on communication delay is designed. An approximate cooperative optimization algorithm based on simulated annealing is proposed. Finally, the effectiveness and robustness of the method presented in this paper are verified via the simulation.展开更多
A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space(action choices and world states) and the number of agents...A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space(action choices and world states) and the number of agents.Nonetheless,there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease,both individually and cooperatively.This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition.Using biologically derived mechanisms for state representation and mammalian social intelligence,we constrain state-action choices in reinforcement learning in order to improve learning efficiency.Analysis results bound the reduction in computational complexity due to stateion,hierarchical representation,and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes.Investigation of two task domains,single-robot herding and multirobot foraging,shows that theoretical bounds hold and that acceptable policies emerge,which reduce task completion time,computational cost,and/or memory resources compared to learning without hierarchical representations and with no social knowledge.展开更多
基金supported by the National Science Foundation for Young Scholars of China(6130123471401175)
文摘Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delays, which is unable to ensure the integrity and timeliness of the information on decision making for satellites. And the optimization of the planning result is affected. Therefore, the effect of communication delay is considered during the multi-satel ite coordinating process. For this problem, firstly, a distributed cooperative optimization problem for multiple satellites in the delayed communication environment is formulized. Secondly, based on both the analysis of the temporal sequence of tasks in a single satellite and the dynamically decoupled characteristics of the multi-satellite system, the environment information of multi-satellite distributed cooperative optimization is constructed on the basis of the directed acyclic graph(DAG). Then, both a cooperative optimization decision making framework and a model are built according to the decentralized partial observable Markov decision process(DEC-POMDP). After that, a satellite coordinating strategy aimed at different conditions of communication delay is mainly analyzed, and a unified processing strategy on communication delay is designed. An approximate cooperative optimization algorithm based on simulated annealing is proposed. Finally, the effectiveness and robustness of the method presented in this paper are verified via the simulation.
基金supported by the Office of Naval Research under Multi-University Research Initiative (MURI) (No.N00014-08-1-0693)
文摘A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space(action choices and world states) and the number of agents.Nonetheless,there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease,both individually and cooperatively.This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition.Using biologically derived mechanisms for state representation and mammalian social intelligence,we constrain state-action choices in reinforcement learning in order to improve learning efficiency.Analysis results bound the reduction in computational complexity due to stateion,hierarchical representation,and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes.Investigation of two task domains,single-robot herding and multirobot foraging,shows that theoretical bounds hold and that acceptable policies emerge,which reduce task completion time,computational cost,and/or memory resources compared to learning without hierarchical representations and with no social knowledge.