摘要
准确的用户流失预测能力有助于企业提高用户保持率、增加用户数量和增加盈利。现有的流失用户预测模型大多为单一模型或是多个模型的简单融合,没有充分发挥多模型集成的优势。借鉴了随机森林的Bootstrap Sampling的思想,提出了一种改进的Stacking集成方法,并将该方法应用到了真实数据集上进行流失用户的预测。通过验证集上的实验比较可知,提出的方法在流失用户F1值、召回率和预测准确率3项指标上均好于所有相同结构的经典Stacking集成方法;当采用恰当的集成结构时,其表现可超越基分类器上的最优表现。
Accurate user churn prediction ability facilitates improving user retention rate,increasing user count and increasing profitability.Most of the existing user churn prediction models are single model or simple integration of multiple models,and the advantages of multi-model integration are not fully utilized.This paper draws on the idea of Bootstrap Sampling in random forests,proposes an improved Stacking ensemble method,and applies the method to the real data set to predict the user churn.Through the experimental comparison on the validation set,the proposed method is better than the classical Stacking ensemble method with the same structure in the terms of the F1-score,recall rate and prediction accuracy of user churn.When the appropriate structure is adopted,the performance can surpass the optimal performance on the base classifier.
作者
叶成
郑红
程云辉
YE Cheng;ZHENG Hong;CHENG Yun-hui(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
出处
《计算机工程与科学》
CSCD
北大核心
2019年第11期2027-2032,共6页
Computer Engineering & Science
基金
国家自然科学基金(61103115,61103172)
上海市科委科技创新行动计划高新技术领域项目(16511101000)