摘要
大规模场景中Multi-agent可视化路径规划算法,需要在Web3D上实现实时、稳定的碰撞避让。提出了动态概率单链收敛回溯DP-Q(λ)算法,采用方向启发约束,使用高奖赏或重惩罚训练方法,在单智能体上采用概率p(0-1随机数)调节奖罚值,决定下一步的寻路策略,同时感知下一位置是否空闲,完成行走过程的避碰行为,将单智能体的路径规划方案扩展到多智能体路径规划方案中,并进一步在Web3D上实现了这一方案。实验结果表明:该算法实现的多智能体实时路径规划具备了在Web3D上自主学习的高效性和稳定性的要求。
The path planning of multi-agent in an unknown large-scale scene needs an efficient and stable algorithm,and needs to solve multi-agent collision avoidance problem,and then completes a real-time path planning in Web3D.To solve above problems,the DP-Q(λ) algorithm is proposed;and the direction constraints,high reward or punishment weight training methods are used to adjust the values of reward or punishment by using a probability p(0-1 random number).The value from reward or punishment determines its next step path planning strategy.If the next position is free,the agent could walk to it.The above strategy is extended to multi-agent path planning,and is used in Web3D.The experiment shows that the DP-Q(λ) algorithm is efficient and stable in the Web3D real-time multi-agent path planning.
作者
闫丰亭
贾金原
Yan Fengting;Jia Jinyuan(School of Software Engineering,Shanghai 201804,China)
出处
《系统仿真学报》
CAS
CSCD
北大核心
2019年第1期16-26,共11页
Journal of System Simulation
基金
国家自然科学基金面上项目(61272270)