摘要
在城市轨道交通列车控制系统中,车车(T2T)通信作为新一代列车通信模式,利用列车间直接通信来降低通信时延,提高列车运行效率。在T2T通信与车地(T2G)通信并存场景下,针对复用T2G链路产生的干扰问题,在保证用户通信质量的前提下,该文提出一种基于多智能体深度强化学习(MADRL)的改进优势演员-评论家(A2C-ac)资源分配算法。首先以系统吞吐量为优化目标,以T2T通信发送端为智能体,策略网络采用分层输出结构指导智能体选择需复用的频谱资源和功率水平,然后智能体做出相应动作并与T2T通信环境交互,得到该时隙下T2G用户和T2T用户吞吐量,价值网络对两者分别评价,利用权重因子β为每个智能体定制化加权时序差分(TD)误差,以此来灵活优化神经网络参数。最后,智能体根据训练好的模型联合选出最佳的频谱资源和功率水平。仿真结果表明,该算法相较于A2C算法和深度Q网络(DQN)算法,在收敛速度、T2T成功接入率、吞吐量等方面均有明显提升。
In the train control system of urban rail transit,Train-to-Train(T2T)communication,a new train communication mode,use direct communication between trains to reduce communication delay and improve train operation efficiency.In the scenario of the coexistence of T2T communication and Train to Ground(T2G)communication,an improved Advantage Actor-Critic-ac(A2C-ac)resource allocation algorithm based on Multi-Agent Deep Reinforcement Learning(MADRL)is proposed to solve the interference problem caused by multiplexing T2G links,and under the premise of ensuring the quality of user communication.Firstly,taking the system throughput as the optimization goal and the T2T communication transmitter as the agent,the policy network adopts a hierarchical output structure to guide the agent in selecting the spectrum resources and power level to be reused.Then the agent makes corresponding actions and interacts with the communication environment to obtain the throughput of T2G users and T2T users in the time slot.The value networkβevaluates the two separately and uses the weight factor to customize the weighted Temporal Difference(TD)error for each agent to optimize the neural network parameters flexibly.Finally,the agents jointly select the best spectral resources and power levels according to the trained model.The simulation results show that compared with the A2C and Deep Q-Networks(DQN)algorithms,the proposed algorithm has significantly improved the convergence speed,T2T successful access rate,and the throughput.
作者
王瑞峰
张明
黄子恒
何涛
WANG Ruifeng;ZHANG Ming;HUANG Ziheng;HE Tao(School of Automation and Electrical Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China;Automatic Control Institute,Lanzhou Jiaotong University,Lanzhou 730070,China)
出处
《电子与信息学报》
EI
CAS
CSCD
北大核心
2024年第4期1306-1313,共8页
Journal of Electronics & Information Technology
基金
国家自然科学基金铁路基础研究联合基金(U2268206)。
关键词
城市轨道交通
资源分配
T2T通信
多智能体深度强化学习
A2C-ac算法
Urban rail transit system
Resource allocation
Train-to-Train(T2T)
Multi-Agent Deep Reinforcement Learning(MADRL)
Advantage Actor-Critic-ac(A2C-ac)algorithm