期刊文献+

VOTCL及其在交叉销售问题上的应用研究

VOTCL and the Study of Its Application on Cross-Selling Problems
下载PDF
导出
摘要 交叉销售已成为企业盈利的重要手段,如何解决其数据中普遍同时存在的类别不平衡和代价敏感问题是准确预测交叉销售客户的关键,也是难点之一.针对上述问题,提出了一种基于最优阈值的投票方法:VOTCL.该方法首先结合过抽样和欠抽样技术获取多个类别平衡的训练数据集,然后在每个平衡数据集上分别训练得到多个底层学习器,最后利用所提出的基于最优阈值的投票集成方法集成底层学习器得到决策模型.在PAKDD2007数据挖掘竞赛的交叉销售数据集上,VOTCL预测的AUC值为0.6037.该集成模型在性能上优于单个学习器,这也在一定程度上表明了所提出的基于最优阈值的投票集成方法的有效性. Cross-selling is regarded as one of the most promising strategies to make profits. The authors first describe a typical cross-selling model, followed by analysis showing that class-imbalance and cost-sensitivity usually co-exist in the data sets collected from this domain. In fact, the central issue in real-world cross-selling applications focuses on the identification of potential cross-selling customers. However, the performance of customer prediction suffers from the problem that class-imbalance and cost-sensitivity are arising simultaneously. To address this problem, an effective method called VOTCL is proposed. In the first stage, VOTCL generates a number of balanced training data sets by combining under-sampling and over-sampling techniques; then a base learner is trained on each of the data set in the second stage; finally, VOTCL obtains the final decision-making model by using an optimal threshold based voting scheme. The effectiveness of VOTCL is validated on the cross-selling data set provided by PAKDD 2007 competition where an AUC value of 0.6037 is achieved by using the proposed method. The ensemble model also outperforms a single base learner, which to some extent shows the efficacy of the proposed optimal threshold based voting scheme.
出处 《计算机研究与发展》 EI CSCD 北大核心 2010年第9期1539-1547,共9页 Journal of Computer Research and Development
关键词 交叉销售 类别不平衡 代价敏感 最优阈值投票 支持向量机 cross-selling class-imbalance cost-sensitive optimal threshold based voting support vector machine
  • 相关文献

参考文献28

  • 1Duda R,Hart P E,Stork D.Pattern Classification[M].New York:Wiley,2000.
  • 2Fawcett T,Provost F.Adaptive fraud detection[J].Data Mining and Knowledge Discovery,1997,1(3):291-316.
  • 3Kubat M,Holte R,Matwin S.Machine learning for the detection of oil spills in satellite radar images[J].Machine Learning,1998,30(2-3):195-215.
  • 4Viaene S,Dedene G.Cost-sensitive learning and decision making revisited[J].European Journal of Operational Research,2005,166(1):212-220.
  • 5Elkan C.The foundations of cost-sensitive learning[C] //Proc of IJCAI'01.San Francisco:Morgan Kaufmann,2001:973-978.
  • 6Chawla N,Bowyer K,Hall L,et al.SMOTE:Synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
  • 7Breiman L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
  • 8Weiss G.Mining with rarity:A unifying framework[J].SIGKDD Explorations,2004,6(1):7-19.
  • 9Lessmann S.Solving imbalanced classification problems with support vector machines[C] //Proc of ICAI'04.Las Vegas,NV:CSREA Press,2004:214-220.
  • 10Veropoulos K,Campbell C,Cristianini N.Controlling the sensitivity of support vector machines[C] //Proc of IJCAI'99.San Francisco:Morgan Kaufmann,1999:55-60.

二级参考文献24

  • 1Turney P D.Types of cost in inductive concept learning//Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning.Stanford University,California,2000:15-21
  • 2Domingos P.MetaCost:A general method for making classifiers cost-sensitive//Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining.San Diego,CA,USA,1999:155-164
  • 3Elkan C.The foundations of cost-sensitive learning//Proceedings of the 17th International Joint Conference of Artificial Intelligence.Seattle,WA,USA,2001:973-978
  • 4Zadrozny B,Elkan C.Learning and making decisions when costs and probabilities are both unknown//Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining.San Francisco,CA,USA,2001:204-213
  • 5Zadrozny B,Langford J,Abe N.Cost-sensitive learning by cost-proportionate example weighting//Proceedings of the 3th International Conference on Data Mining.2003
  • 6Ting K M.Inducing cost-sensitive trees via instance weighting//Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery.Lecture Notes in Computer Science 1510.London,UK:Springer-Verlag,1998:139-147
  • 7Drummond C,Holte R.Exploiting the cost (in)sensitivity of decision tree splitting criteria//Proceedings of the 17th International Conference on Machine Learning.2000:239-246
  • 8Drummond C,Holte R C.C4.5,Class imbalance,and cost sensitivity:Why under-sampling beats over-sampling//Proceedings of the Workshop on Learning from Imbalanced Datasets Ⅱ,Washington,DC,USA,2003
  • 9Turney P D.Cost-sensitive classification:Empirical evaluation of a hybrid genetic decision tree induction algorithm.Journal of Artificial Intelligence Research,1995,2:369-409
  • 10Ling C X,Yang Q,Wang J,Zhang S.Decision trees with minimal costs//Proceedings of the 2004 International Conference on Machine Learning (ICML'2004).2004

共引文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部