摘要
实时竞价(RTB)是在线展示广告中被广泛采用的广告投放模式,针对由于RTB拍卖环境的高度动态性导致最佳出价策略难以获得的问题,提出了一种基于强化学习(RL)的出价策略优化方法,即采用带惩罚的点概率距离策略优化(POP3D)算法来学习最佳出价策略。在基于POP3D的出价框架中,广告投标过程被建模为情节式的马尔可夫决策过程,每个情节被划分为固定数量的时间步,每个广告展示的出价由它的预估点击率大小和竞标因子共同决定。每个时间步,竞标代理都会根据上一时间步的拍卖情况对竞标因子进行调整,以使得出价策略能够适应高度动态的拍卖环境,竞标代理的目标是学习最佳的竞标因子调整策略。在iPinYou数据集上的实验结果表明,与DRLB算法相比,所提出价算法在预算比例为1/16和1/32时,在点击次数方面均提升了0.2%;当预算比例为1/8、1/16和1/32时,在赢标率方面分别提升了1.8%、1.0%和1.7%;另外,在稳定性方面,所提方法也具有优势。表明了该方法的优越性。
Real-time bidding(RTB)is a widely used advertising mode in online display advertising.In response to the problem that the best bidding strategy is difficult to obtain due to the high dynamics of the RTB auction environment,this paper proposed a method of optimizing bid strategy based on reinforcement learning(RL),which used the policy optimization with pena-lized point probability distance(POP3D)algorithm to learn the best bidding strategy.In the POP3D-based bidding framework,the advertising bidding process was modeled as an episodic Markov decision process.Each epiosode contained a fixed number of time steps,and the bid for each ad display was determined by its estimated click-through rate and bidding factors were jointly determined.At each time step,the bidding agent would adjust the bidding factors according to the auction situation at the previous time step,so that the bidding strategy could adapt to the highly dynamic auction environment.The goal of the bidding agent was to learn the best bidding factor adjustment strategy.The experimental results on the iPinYou data set demonstrate that compared with the DRLB algorithm,the proposed bidding algorithm increases the number of clicks by 0.2%when the budget ratio is 1/16 and 1/32,as well as increased the win rate by 1.8%,1.0%and 1.7%when the budget ratio is were 1/8,1/16 and 1/32.In addition,the proposed method also has advantages in terms of stability.It shows the superiority of this method.
作者
李文权
齐琦
李霓
刘永娜
Li Wenquan;Qi Qi;Li Ni;Liu Yongna(College of Computer Science&Technology,Hainan University,Haikou 570228,China;College of Mathematics&Statistics,Hainan Normal University,Haikou 571158,China;College of Applied Science&Technology,Hainan University,Danzhou Hainan 571737,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第2期461-467,共7页
Application Research of Computers
基金
国家自然科学基金资助项目(11861030)
国家重点研发计划资助项目(2018YFB1404400)
海南省自然科学基金资助项目(2019RC176)
教育部第49批留学回国人员科研启动基金资助项目(2015-311)。
关键词
展示广告
实时竞价
出价策略
强化学习
display advertising
real-time bidding
bidding strategy
reinforcement learning