摘要
码率自适应(Adaptive BitRate,ABR)算法是视频客户端提高用户体验质量(Quality of Experience,QoE)的一种有效途径.针对现有ABR算法存在频繁缓冲、视频卡顿、画质较低和网络吞吐量预测不准确等问题,本文提出一种基于深度强化学习的码率自适应(Deep Reinforcement Learning based ABR,DRLA)算法.DRLA用实际网络带宽数据训练神经网络,通过收集客户端缓冲区占用率和网络吞吐量向视频服务器请求最佳码率的视频.首先,DRLA用基线函数方法优化损失函数L,用熵随机探索方法防止损失函数局部收敛;其次利用约束条件限制新旧策略的散度更新幅度提高算法的鲁棒性;最后通过置信域(trust region)优化找到最优策略,使得QoE达到最优.与现有ABR算法对比的实验结果表明:DRLA减少了训练时间,能进一步提高算法的鲁棒性和用户的QoE,并在实际环境下验证了算法的有效性.
Modern video players employ adaptive bitrate(ABR)algorithms to improve user quality of experience(QoE).Aiming at the problems of the existing ABR algorithms,for example,these algorithms usually lead to frequent rebuffering,video freezes,low video quality,or inaccurate network throughput prediction.In this paper,we propose a deep reinforcement learning algorithm based on ABR(DRLA).DRLA trains the neural network with the actual network bandwidth data,and requests the video with the best bit rate from the video server by collecting the client buffer occupancy rate and network throughput.DRLA optimizes the loss function with the baseline function method.To encourage exploration,we add an entropy regularization term to the update rule of the policy network.Then,DRLA uses constraints to limit the divergence of the new and old policies.Besides,DRLA optimizes the policy to use trust region to improve QoE.Compared with the existing ABR algorithms on the QoE metrics,DRLA reduces training time,is more robust,and can further improve QoE,and the experimental results verify the effectiveness of this algorithm.
作者
易令
李泽平
YI Ling;LI Ze-ping(School of Computer Science and Technology,Guizhou University,Guiyang,Guizhou 550025,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2022年第5期1192-1200,共9页
Acta Electronica Sinica
基金
国家自然科学基金(No.61462014)。
关键词
码率自适应算法
体验质量
深度强化学习
基线函数
熵
置信域
adaptive bitrate algorithm
quality of experience
deep reinforcement learning
baseline function
entropy
trust region