摘要
点击率(Click Through Rate,CTR)预测是推荐系统领域最常见的应用之一,它通过对用户属性及相关行为信息建模,计算特定情况下用户点击某项物品的可能性,其准确性对用户体验和平台收入有直接影响。当前工业界常用的CTR预估方法包括GBDT+LR(Gradient Boosting Decision Tree+Logistic Regression,梯度提升决策树+逻辑回归)、FM(factor Machine,因子分解机)、DNN(Deep Neural Networks,深度神经网络)以及Transformer等,其中,FM因对高维稀疏矩阵具有良好的处理能力以及拥有线性的计算复杂度而表现得尤为突出。但其也存在只能进行浅层特征交互问题,特征交互权重表达能力不够等缺点。本文在三个正负样本比例不同的公开数据集上对FM及其八种衍化模型进行了广泛的对比试验分析,探索其在CTR问题上的优势与不足,以及如何改进。
Click Through Rate(CTR)prediction is one of the most successful applications in the field of recommendation systems.It calculates the possibility of users clicking on an item under specific circumstances by modeling user attributes and related behavior information.Its accuracy has a direct impact on user experience and platform revenue.At present,the commonly used CTR prediction methods in the industry include GBDT+LR(Gradient Boosting Decision Tree+Logical Regression),FM(factor machine),DNN(Deep Neural Networks),Transformer,etc.Among them,FM is particularly prominent because of its good processing capacity for high-dimensional sparse matrices and linear computational complexity.But it also has some shortcomings,such as only shallow feature interaction and insufficient expression ability of feature interaction weight.In this paper,on three open datasets with different proportion of positive and negative samples,we have carried out extensive comparative experimental analysis on nine FM derived models,and explored their advantages and disadvantages in CTR,as well as how to improve them.
作者
任睿杰
REN Rui-jie(School of Electronic Information,Yangtze University,Jingzhou 434100,China)
出处
《电脑与信息技术》
2023年第5期72-75,共4页
Computer and Information Technology