摘要
在手机用户数据集中,非换机用户和换机用户存在着严重的不平衡,传统的数据挖掘方法在处理不平衡数据时追求整体正确率,导致换机用户的预测精度较低。针对这一问题,提出一种基于分级式代价敏感决策树的换机预测方法。首先利用粗糙集对原始数据集进行属性约简并计算各属性的重要度,然后根据属性重要度对属性分块建立分级结构,最后以基尼系数和误分代价为分裂标准构建代价敏感决策树,作为每一级的基分类器。对某电信运营商客户数据进行3个仿真试验,结果表明:分级式代价敏感决策树在原始的不平衡用户数据集及欠抽样处理后的平衡用户数据集上都有较好的结果。
In the data of mobile phone users,imbalance problem existed between the replacement users and non replacement users,how ever traditional date mining pursued the best overall accuracy which led the prediction accuracy of the replacement users overly low. In order to solve this problem,a method of predicting the users who replace phone was proposed based on hierarchical cost sensitive decision tree. The algorithm realized attributes reduction and calculated the importance of attributes by rough set,then a hierarchical structure was built by parting the attributes; finally a cost sensitive decision tree was regarded as the base classifier for the hierarchical structure,the decision tree was constructed with its splitting criterion which included gini index and misclassification cost. Three experiments were made for the users data which from a telecom operator,the results showed that the hierarchical cost sensitive decision tree achieved a better effect on the imbalance user data and balance user data which obtained by under sampling.
出处
《山东大学学报(工学版)》
CAS
北大核心
2015年第5期36-42,共7页
Journal of Shandong University(Engineering Science)
基金
国家自然科学基金资助项目(61272060)
重庆市自然科学基金资助项目(cstc2012jjA40032
cstc2013jcyjA40063)
重庆市/信息产业部计算机网络与通信技术重点实验室开放基金资助项目(CY-CNCL-2010-05)
关键词
分级结构
决策树
代价敏感
不平衡数据
换机预测
hierarchical structure
decision tree
cost sensitive
imbalance data
prediction of replacing phone