摘要
由于互联网中积累的广告日志具有数据稀疏、特征量大、正负样本分布极其不均匀等问题,使得人工特征提取费时费力,并且单一预测模型很难得到更好的预测性能。针对这些问题,提出梯度提升树GBDT和Stacking相融合的点击率预测模型GBDT-Stacking。通过引入梯度提升树自动进行特征提取与构造,并结合Stacking集成模型对在线广告点击率进行预测,有效提高了单个预测模型的性能。在真实广告数据集上的实验结果表明,GBDT-Stacking集成模型比对比模型在AUC的取值上至少提升了4%。
Because the accumulated advertisement logs in the Internet have the problems of sparse data,a large number of features and extremely unbalanced distribution of positive and negative samples,manual feature extraction is time-consuming and laborious,and it is difficult for a single prediction model to obtain better prediction performance.In response to these problems,this paper completes a click through rate prediction model based on GBDT model and stacking.This model uses GBDT model to automatically extract and construct features,and predicts and classifies click-through rate by Stacking model,which effectively improves the performance of the single prediction model.Experiments on real advertising data sets show that the GBDT-Stacking ensemble method increases the AUC value by at least 4%compared to the comparison model.
作者
贺小娟
潘文捷
程宏
HE Xiao-juan;PAN Wen-jie;CHENG Hong(School of Statistics and Information,Shanghai University of International Business and Economics,Shanghai 201620;School of Statistics and Mathematics,Shanghai Lixin University of Accounting and Finance,Shanghai 201209,China)
出处
《计算机工程与科学》
CSCD
北大核心
2019年第12期2278-2284,共7页
Computer Engineering & Science
基金
2016年上海市青年科技英才扬帆计划(16YF1415900)
上海立信会计金融学院统计学一级学科建设项目
关键词
梯度提升树
Stacking集成学习
SMOTE
广告点击率
GBDT(gradient boosted decision tree)
Stacking ensemble learning
SMOTE(synthetic minority oversampling technique)
click-through rate