摘要
[目的]由于购买商品的消费者数量远小于未购买商品的消费者数量,网购意愿预测研究是典型的不平衡数据分类问题.研究不平衡数据的分类问题以提升网购意愿预测的分类准确率,该问题主要存在少数类样本识别准确率远小于多数类样本的问题.[方法]提出一种基于贝叶斯优化的代价敏感轻量梯度提升机(Light Gradient Boosting Machine, LightGBM)模型.首先引入误分类代价作为惩罚因子修正LightGBM的损失函数,其次通过阈值移动降低模型的分类阈值以提高针对少数类样本的预测准确率,最后利用贝叶斯优化算法优化误分类代价参数、分类阈值及其他参数.[结果]从KEEL数据库中选取5个典型的不平衡数据集进行对比实验,相较于标准LightGBM模型,改进LightGBM模型的AUC值和G-mean值均提升了10%左右;相较于遗传算法优化代价敏感LightGBM模型和粒子群优化代价敏感LightGBM模型,改进LightGBM模型的AUC值和G-mean值普遍提升了4%左右;相较于ADASYN-LightGBM模型和BorderlineSMOTE-LightGBM模型,改进LightGBM模型的AUC值和G-mean值普遍提升了3%左右.[结论]基于代价敏感学习在LightGBM损失函数中添加误分类代价作为惩罚因子,并通过阈值移动降低模型的分类阈值,同时利用贝叶斯优化算法优化代价敏感LightGBM模型中的误分类代价参数、分类阈值及其他参数,实现更高的少数类样本预测准确率,提升了网购意愿预测的分类准确率.
[Objective]The research of online shopping intention prediction is a typical unbalanced data classification problem.The number of consumers buying goods is much smaller than the number of consumers not buying goods.The purpose of this paper is to solve the problem that the recognition accuracy of minority samples is much lower than that of majority samples.[Methods]This paper proposes a cost-sensitive LightGBM(light gradient boosting machine)model based on Bayes optimization.Firstly,the misclassification cost is introduced as a penalty factor to modify the loss function of LightGBM.Secondly,the classification threshold of the model is reduced by threshold shifting to improve the prediction accuracy of minority samples.Finally,the parameters of misclassification cost,classification threshold and other parameters are optimized by Bayes optimization algorithm.[Results]Five typical unbalanced datasets are selected from the KEEL database.To verify the effectiveness of the improved LightGBM algorithm proposed in this paper,the improved LightGBM algorithm is compared with standard LightGBM algorithm,genetic algorithm optimization cost-sensitive LightGBM algorithm,particle swarm optimization cost-sensitive LightGBM algorithm,ADASYN-LightGBM(adaptive synthetic sampling approach)algorithm,BorderlineSMOTE-LightGBM(borderline synthetic minority oversampling technique)algorithm,respectively.The AUC(area under curve)and G-mean(geometric mean)are used as evaluation indexes to evaluate the performance of the model,and the final experimental results are obtained after 100 iterations and cross-validation with ten folds.Compared with the standard LightGBM model,the AUC value and G-mean value of the cost-sensitive LightGBM model have both increased by about 10%,indicating that the introduction of cost-sensitive learning has significantly improved the classification performance of LightGBM model,and can better deal with unbalanced data classification problems.Compared with genetic algorithm optimization cost-sensitive LightGBM model and particle swarm optimization cost-sensitive LightGBM model,the AUC value and G-mean value of Bayes optimization cost-sensitive LightGBM model are generally increased by about 4%.It shows that Bayes optimization has certain advantages in parameter optimization of cost-sensitive LightGBM algorithm.Compared with ADASYN-LightGBM model and BorderlineSMOTE-LightGBM model,the AUC value and G-mean value of Bayes optimization cost-sensitive LightGBM model are generally increased by about 3%.The results show that Bayes optimization cost-sensitive LightGBM model is better than the combination of two sample sampling methods and LightGBM model in the classification of unbalanced data.To verify the validity of the prediction model of consumers'online shopping intention based on Bayes optimization cost-sensitive LightGBM,the paper selects the consumer behavior data provided by Jingdong platform.The data is the historical interaction behavior records of consumers,commodities,categories and stores provided in Jingdong JDATA algorithm competition from February 1,2018 to April 15,2018.The final experimental results are G-mean value of 0.913,AUC value of 0.920 and F 1 value of 0.692.Compared with the prediction results of the other two studies on the same dataset,the prediction model of online shopping intention based on Bayes optimization cost-sensitive LightGBM has better performance.[Conclusions]Aiming at the problem of unbalanced data in online shopping intention research,the paper proposes a prediction model of online shopping intention based on Bayes optimization cost-sensitive LightGBM.Based on cost-sensitive learning,the classification error cost is added to LightGBM loss function as a penalty factor,and the classification threshold of the model is reduced by moving the threshold to improve the prediction accuracy for minority samples.The classification error cost parameters,classification threshold and other parameters in the cost-sensitive LightGBM model were optimized by using Bayes optimization algorithm.Experimental results on the KEEL dataset show that:compared with standard LightGBM,genetic algorithm optimization cost-sensitive LightGBM,particle swarm optimization cost-sensitive LightGBM,ADASYN-LightGBM and BorderlineSMOTE-LightGBM models,Bayes optimization cost-sensitive LightGBM model has certain advantages and effectiveness in dealing with unbalanced data problems.The empirical results on Jingdong consumer behavior dataset show that Bayes optimization cost-sensitive LightGBM model can better predict consumers'online shopping intentions.
作者
罗咪
邱一卉
林建宗
LUO Mi;QIU Yihui;LIN Jianzong(School of Economics and Management,Xiamen University of Technology,Xiamen 361005,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2024年第2期232-240,共9页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(7180040248)
福建省自然科学基金(2022J011261)。