摘要
交叉销售已成为企业盈利的重要手段,如何解决其数据中普遍同时存在的类别不平衡和代价敏感问题是准确预测交叉销售客户的关键,也是难点之一.针对上述问题,提出了一种基于最优阈值的投票方法:VOTCL.该方法首先结合过抽样和欠抽样技术获取多个类别平衡的训练数据集,然后在每个平衡数据集上分别训练得到多个底层学习器,最后利用所提出的基于最优阈值的投票集成方法集成底层学习器得到决策模型.在PAKDD2007数据挖掘竞赛的交叉销售数据集上,VOTCL预测的AUC值为0.6037.该集成模型在性能上优于单个学习器,这也在一定程度上表明了所提出的基于最优阈值的投票集成方法的有效性.
Cross-selling is regarded as one of the most promising strategies to make profits. The authors first describe a typical cross-selling model, followed by analysis showing that class-imbalance and cost-sensitivity usually co-exist in the data sets collected from this domain. In fact, the central issue in real-world cross-selling applications focuses on the identification of potential cross-selling customers. However, the performance of customer prediction suffers from the problem that class-imbalance and cost-sensitivity are arising simultaneously. To address this problem, an effective method called VOTCL is proposed. In the first stage, VOTCL generates a number of balanced training data sets by combining under-sampling and over-sampling techniques; then a base learner is trained on each of the data set in the second stage; finally, VOTCL obtains the final decision-making model by using an optimal threshold based voting scheme. The effectiveness of VOTCL is validated on the cross-selling data set provided by PAKDD 2007 competition where an AUC value of 0.6037 is achieved by using the proposed method. The ensemble model also outperforms a single base learner, which to some extent shows the efficacy of the proposed optimal threshold based voting scheme.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2010年第9期1539-1547,共9页
Journal of Computer Research and Development
关键词
交叉销售
类别不平衡
代价敏感
最优阈值投票
支持向量机
cross-selling
class-imbalance
cost-sensitive
optimal threshold based voting
support vector machine