摘要
在推荐系统中,点击后转化率是反映用户偏好的重要信号.然而,传统的双鲁棒估计器在预测转化率时存在选择偏差的问题,这会导致估计器方差和偏差过高.为解决以上难题,提出了通用的双鲁棒去偏学习模型,在选择偏差场景下提出更具稳定性的双鲁棒估计器,该估计器通过改进填充模型的训练权重来增加低倾向样本惩罚,缩小点击样本和曝光样本的分布差距,缓解双鲁棒估计器的偏差影响;受强化学习双重深度Q网络的启发,改进双学习模式为交替学习模型,交换转化率预测模型、点击率预测模型、填充预测模型之间的梯度信号,并指导网络模型的参数更新,缓解模型方差过高的问题;另外,在参数更新过程中,将预测模型的回归问题转化为二分类问题,降低了预测模型学习的复杂程度,提高模型可解释性.实验在两个真实的大型数据集和一个半合成数据集中进行,与已有的去偏方法对比,实验验证,所提方法在召回率和累计收益率方面优于其他方法,其中,相较于主要实现降低方差的更具鲁棒性的双鲁棒双学习模型在指标DCG@2和Recall@2上分别提升4.43%和4.97%,相较于主要实现降低偏差的双鲁棒联合学习模型在指标DCG@2和Recall@2上分别提升了7.21%和10.11%.
In recommendation systems,conversion rate after click is an important signal reflecting user preferences.However,traditional doubly-robust estimators have the problem of selection bias when predicting conversion rates,leading to high estimator variance and bias.To address this issue,a general doubly-robust learning model is proposed to introduce a more stable doubly-robust estimator in selection bias scenarios.This estimator increases the penalty for low-propensity samples by improving the training weights of the imputation model,reduces the distribution gap between click samples and exposure samples,and mitigates the bias effect of the doubly-robust estimator.Inspired by the double deep Q network in reinforcement learning,the double learning mode is improved to an alternating learning model,which exchanges gradient signals between conversion rate prediction model,click-through rate prediction model,and imputation prediction model,and guides the parameter update of the network model to alleviate the problem of high model variance.Additionally,during the parameter update process,the regression problem of the prediction model is converted to a binary classification problem,reducing the complexity of the prediction model learning and improving model interpretability.The experiments were conducted on two real large-scale datasets and one semi-synthetic dataset.Compared with existing debiasing methods,the experimental results verified that the proposed method is superior in recall rate and cumulative gain rate.Specifically,the more robust doubly robust-double learning model,which mainly reduces variance,has increased by 4.43%and 4.97%in the DCG@2 and Recall@2 indicators,respectively,while the doubly robust-joint learning model,which mainly reduces bias,has increased by 7.21%and 10.11%in the DCG@2 and Recall@2 indicators respectively.
作者
苗忠琦
童向荣
MIAO Zhongqi;TONG Xiangrong(School of Computer and Control Engineering,Yantai University,Yantai 264005,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第11期2663-2672,共10页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62072392,61972360)资助
山东省重大科技创新工程项目(2019522Y020131)资助
山东省自然科学基金项目(ZR2020QF113)资助.
关键词
推荐系统
选择偏差
双鲁棒学习
点击后转化率
recommendation system
selection bias
double robust learning
post-click conversion rate