摘要
发挥机器学习算法在分类预测方面的优势,通过实证研究探索付费知识直播用户流失预测模型,分析预测特征变量,为用户留存管理提供决策依据。以知乎Live为数据来源,从用户价值特征及评价特征两个维度出发,采集用户最近一次消费时间、月均消费次数、次均消费金额、首次消费时间及评分、评论文本共六项特征数据,基于六种机器学习算法构建预测模型,比较不同模型的预测效果。对比分析特征变量在用户流失预测中的贡献度,根据关键特征变量划分流失用户类型,提出相应留存策略。评分与评论文本情感对用户流失预测具有显著作用;基于集成学习的XGBoost用户流失预测模型综合表现最好,随机森林次之,集成学习优越的泛化性能得到验证;通过分析影响用户流失预测的重要变量,归纳总结出四类流失用户类型。
Taking advantage of machine learning algorithm in classification prediction, this paper explores a user churn prediction model of paid knowledge live through empirical research, analyzes the prediction variables, and provides decision-making basis for user retention management. Taking Zhihu live as data source, starting from two dimensions of user value characteristics and user review characteristics, users’ latest consumption time, monthly average consumption times, average consumption amount, first consumption time, rating and comment text are collected, and then prediction models are constructed based on six different machine learning algorithms, and their prediction effects are compared. Then, the contribution of variables in the prediction of user churn is compared and analyzed. According to the key variables, the types of churn users are divided, and the corresponding retention strategies are proposed. Rating and comment sentiment have significant effect on user churn prediction;XGBoost model based on ensemble learning has the best performance, followed by random forest, so the superior generalization performance of ensemble learning has been well verified. By analyzing the important factors that affect user churn prediction, four types of churn users are summarized.
作者
邢绍艳
朱学芳
Xing Shaoyan;Zhu Xuefang(School of Information Management,Nanjing University,Najing,210023)
出处
《信息资源管理学报》
CSSCI
2022年第4期121-130,140,共11页
Journal of Information Resources Management
关键词
机器学习
知识直播
知识付费
用户流失
预测效果
用户价值
用户评价
Machine learning
Knowledge online live
Paid knowledge
User churn
Prediction effect
User value
User evaluation