期刊文献+

基于共享注意力的多智能体强化学习订单派送

Order dispatching by multi-agent reinforcement learning based on shared attention
下载PDF
导出
摘要 网约车因方便、快捷成为现今人们出行热门之选,如何更高效地派送合适的订单将乘客送到目的地是如今研究的热点。许多研究着重于训练单智能体,再由它统一分配订单,车辆本身并不参与决策。针对以上问题,提出一种基于共享注意力的多智能体强化学习(SARL)算法。该算法将订单派送问题建模为一个马尔可夫决策过程,运用多智能体强化学习,通过集中训练、分散执行的方式让每个智能体均成为决策者;同时加入共享注意力机制,让智能体彼此共享信息并合作。最后,在不同尺度地图、不同乘客数以及不同车辆数情形下与完全随机匹配(Random)、贪婪算法(Greedy)、多智能体强化学习算法IDQN和混合Q值网络(QMIX)进行对比。结果显示,在固定和可变的车辆与乘客组合情况下,SARL算法在三个不同尺度地图(100×100、10×10和500×500)的时间效率均达到了最优,验证了算法的泛化性能和稳定性。SARL算法可以优化车辆和乘客的配对,减少乘客等待时间,提升乘客满意度。 Ride-hailing has become a popular choice for people to travel due to its convenience and speed,how to efficiently dispatch the appropriate orders to deliver passengers to the destination is a research hotspot today.Many researches focus on training a single agent,which then uniformly distributies orders,without the vehicle itself being involved in the decision making.To solve the above problem,a multi-agent reinforcement learning algorithm based on shared attention,named SARL(Shared Attention Reinforcement Learning),was proposed.In the algorithm,the order dispatching problem was modeled as a Markov decision process,and multi-agent reinforcement learning was used to make each agent become a decision-maker through centralized training and decentralized execution.Meanwhile,the shared attention mechanism was added to make the agents share information and cooperate with each other.Comparison experiments with Random matching(Random),Greedy algorithm(Greedy),Individual Deep-Q-Network(IDQN)and Q-learning MIXing network(QMIX)were conducted under different map scales,different number of passengers and different number of vehicles.Experimental results show that the SARL algorithm achieves optimal time efficiency in three different scale maps(100×100,10×10 and 500×500)for fixed and variable vehicle and passenger combinations,which verifies the generalization performance and stable performance of the SARL algorithm.The SARL algorithm can optimize the matching of vehicles and passengers,reduce the waiting time of passengers and improve the satisfaction of passengers.
作者 黄晓辉 杨凯铭 凌嘉壕 HUANG Xiaohui;YANG Kaiming;LING Jiahao(School of Information Engineering,East China Jiaotong University,Nanchang Jiangxi 330013,China)
出处 《计算机应用》 CSCD 北大核心 2023年第5期1620-1624,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(62062033) 江西省自然科学基金资助项目(20212BAB202008)。
关键词 机器学习 深度强化学习 注意力机制 多智能体强化学习 车辆订单派送 machine learning deep reinforcement learning attention mechanism multi-agent reinforcement learning vehicle order dispatching
  • 相关文献

参考文献3

二级参考文献14

共引文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部