摘要
针对电信行业客户流失预测问题的复杂性,本文将能够处理大规模数据、容噪性能较好的组合分类器算法——随机森林方法应用于电信行业的客户流失预测中.针对影响组合分类器性能的关键指标——差异度,提出了一种新的基于随机森林相似度矩阵的差异度测度,并在此基础上提出了一种改进的组合剪枝技术,对随机森林的基分类器进行剪枝,得到规模较小但泛化性能更优的基于剪枝随机森林的客户流失预测模型.实验结果表明,与其他方法相比,新的差异度测度方法更好地描述单个分类器之间的差异度,本文提出的基于剪枝随机森林的客户流失预测模型具有更高的预测准确率、更小的组合分类器规模和更好的效率,有望成为该领域一种可行且有效的方案.
Single classifier with complex parameters is not suitable for customer-churn prediction task.In this paper,the random forest,which can process large amount of data and is invariant to the noise,is applied to the telecom customer-churn prediction.Diversity measure plays an important role in the performance of assemble classifiers.We propose a new diversity measure based on the simplified proximity matrix.Based on this measure,an improved combining pruning technology is proposed to prune the random forests,resulting in a smaller but with better generalization customer-churn prediction model.Experiment results show that the new diversity measure describes the diversity better than other measures.Also,the proposed customer-churn prediction model achieves higher accuracy,better assembling structure,and higher efficiency.It is likely for this new method to become a powerful candidate in the customer-churn prediction for telecom enterprises.
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2014年第6期817-823,共7页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(71371162)
厦门理工学院高层次人才项目(YSK120162)
关键词
客户流失预测
随机森林
组合分类器
剪枝技术
customer-churn prediction
random forest
combined classifier
pruning technology