摘要
目前的DASH客户端码率决策依赖基于特定环境的低准确性的建模来实现固定的控制算法,很难捕获和反映真实网络环境中动态网络的变化情况。本文采用了强化学习中的近端策略优化和深度神经网络相结合的算法,能够学习网络环境的动态变化特性做出决策,并根据价值网络输出调整策略网络的参数,逐渐收敛到最优策略。通过对真实网络轨迹数据集的实验证明:该算法比现有算法可获得更高的用户体验质量,具有较少的缓冲区下溢,并且保证了视频播放的流畅性。
The current client-based DASH bitrate decision relies on low-accuracy modeling based on a specific environment to implement a fixed contro algorithm,which is difficult to capture and reflect changes in the dynamic network in a real network environment.In this paper,the algorithm combining the proximal policy optimization in reinforcement learning and deep neural network is adopted.The algorithm can learn the dynamic characteristics of the network environment to make decisions,constantly update the policy network parameters based on the output of the value network to gradually converge to the optimal policy.Through experiments on real network trace datasets,the algorithm used in this paper can achieve higher user experience quality than existing algorithms,and has less buffer underflow,and ensures smooth video playback.
作者
冯苏柳
姜秀华
Feng Su-liu;Jiang Xiu-hua(School of Communication and Information Engineering Communication University of China,Beijing 100024,China)
出处
《中国传媒大学学报(自然科学版)》
2020年第2期59-64,83,共7页
Journal of Communication University of China:Science and Technology