摘要
在5G大连接物联网场景下,针对大连接物联网设备(massive machine type communication device,mMTCD)的接入拥塞现象,提出了基于价值差异探索的双重深度Q网络(double deep Q network with value-difference based exploration,VDBE-DDQN)算法。该算法着重解决了在多小区网络环境下mMTCD接入基站的问题,并将该深度强化算法的状态转移过程建模为马尔可夫决策过程。该算法使用双重深度Q网络来拟合目标状态—动作值函数,并采用基于价值差异的探索策略,可以同时利用当前条件和预期的未来需求来应对环境变化,每个mMTCD根据当前值函数与网络估计的下一时刻值函数的差异来更新探索概率,而不是使用统一的标准,从而为mMTCD选择最佳基站。仿真结果表明,所提算法可有效提高系统的接入成功率。
In the massive machine type communication scenario of 5G,the access congestion problem of massive machine type communication devices(mMTCD)in multi-cell network is very important.A double deep Q network with value-difference based exploration(VDBE-DDQN)algorithm was proposed.The algorithm focused on the solution that could reduce the collision when a number of mMTCDs accessed to eNB in multi-cell network.The state transition process of the deep reinforcement learning algorithm was modeled as Markov decision process.Furthermore,the algorithm used a double deep Q network to fit the target state-action value function,and it employed an exploration strategy based on value-difference to adapt the change of the environment,which could take advantage of both current conditions and expected future needs.Moreover,each mMTCD updated the probability of exploration according to the difference between the current value function and the next value function estimated by the network,rather than using the same standard to select the best base eNB for the mMTCD.Simulation results show that the proposed algorithm can effectively improve the access success rate of the system.
作者
李昕
孙君
LI Xin;SUN Jun(College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;Jiangsu Key Laboratory of Wireless Communications,Nanjing 210003,China)
出处
《电信科学》
2022年第6期82-90,共9页
Telecommunications Science
基金
国家自然科学基金资助项目(No.61771255)
省部级重点实验室开放课题项目(No.20190904)。