摘要
消费行为预测在营销活动中具有重要的价值,其预测效果主要取决于特征工程与算法建模。通过特征提取与新特征发现,提出定长与变长滑动窗口相结合的特征提取方法和基于先验知识与矩阵分解的特征交叉方法。特征提取方法考虑样本不平衡和用户消费习惯,提取更多的样本数据并给特征加上时间属性,而特征交叉方法考虑商品与用户之间隐含的关联关系,提取有关联的新特征。对于单一模型预测效果较差的问题,采用stacking策略构建集成学习模型,以XGBoost、随机森林和梯度提升决策树作为初级学习器对特征进行变换,以逻辑回归作为元学习器对用户消费行为进行预测。实验结果表明,该特征工程方法在多个模型算法中均能明显提高精准率,该集成学习模型预测效果要比单个模型更好。
The prediction of consumption behavior is of great value in marketing activities, and its prediction effect mainly depends on feature engineering and algorithm modeling. Through feature extraction and new feature discovery, the feature extraction method combining fixed length and variable length sliding window and feature intersection method based on prior knowledge and matrix decomposition are proposed. Feature extraction method takes sample imbalance and consumer habits into account, extracts more sample data and adds time attribute to features. Feature intersection method takes the implicit relationship between goods and users into account to extract new features with relevance. For the first mock exam, the stacking model is used to build the ensemble learning model. The XGBoost, random forest and gradient decision tree are used as primary learning devices to transform the features, and logistic regression is used as a meta learning device to predict user consumption behavior. The experimental results show that the feature engineering method can improve the accuracy of the algorithm in many models, and the prediction effect of the integrated learning model is better than that of a single model.
作者
贾志强
李涛
乐金祥
JIA Zhi-qiang;LI Tao;YUE Jin-xiang(School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,China)
出处
《计算机技术与发展》
2022年第5期141-146,共6页
Computer Technology and Development
基金
国家自然科学基金资助项目(61702383)
湖北省教育厅重大项目(17ZD014)。