期刊文献+

分级式代价敏感决策树及其在手机换机预测中的应用 被引量:5

Hierarchical cost sensitive decision tree and its application in the prediction of the mobile phone replacement
原文传递
导出
摘要 在手机用户数据集中,非换机用户和换机用户存在着严重的不平衡,传统的数据挖掘方法在处理不平衡数据时追求整体正确率,导致换机用户的预测精度较低。针对这一问题,提出一种基于分级式代价敏感决策树的换机预测方法。首先利用粗糙集对原始数据集进行属性约简并计算各属性的重要度,然后根据属性重要度对属性分块建立分级结构,最后以基尼系数和误分代价为分裂标准构建代价敏感决策树,作为每一级的基分类器。对某电信运营商客户数据进行3个仿真试验,结果表明:分级式代价敏感决策树在原始的不平衡用户数据集及欠抽样处理后的平衡用户数据集上都有较好的结果。 In the data of mobile phone users,imbalance problem existed between the replacement users and non replacement users,how ever traditional date mining pursued the best overall accuracy which led the prediction accuracy of the replacement users overly low. In order to solve this problem,a method of predicting the users who replace phone was proposed based on hierarchical cost sensitive decision tree. The algorithm realized attributes reduction and calculated the importance of attributes by rough set,then a hierarchical structure was built by parting the attributes; finally a cost sensitive decision tree was regarded as the base classifier for the hierarchical structure,the decision tree was constructed with its splitting criterion which included gini index and misclassification cost. Three experiments were made for the users data which from a telecom operator,the results showed that the hierarchical cost sensitive decision tree achieved a better effect on the imbalance user data and balance user data which obtained by under sampling.
出处 《山东大学学报(工学版)》 CAS 北大核心 2015年第5期36-42,共7页 Journal of Shandong University(Engineering Science)
基金 国家自然科学基金资助项目(61272060) 重庆市自然科学基金资助项目(cstc2012jjA40032 cstc2013jcyjA40063) 重庆市/信息产业部计算机网络与通信技术重点实验室开放基金资助项目(CY-CNCL-2010-05)
关键词 分级结构 决策树 代价敏感 不平衡数据 换机预测 hierarchical structure decision tree cost sensitive imbalance data prediction of replacing phone
  • 相关文献

参考文献19

  • 1BATISTA G E, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data[J]. ACM Sigkdd Explorations Newsletter, 2004, 6(1):20-29.
  • 2KOTSIANTIS S B, PINTELAS P E. Mixture of expert agents for handling imbalanced data sets[J]. Annals of Mathematics, Computing & Teleinformatics, 2003, 1(1):46-55.
  • 3CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1):321-357.
  • 4HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[J]. Computer Science, 2005, 3644:878-887.
  • 5GARCIA S, HERRERA F. Evolutionary under sampling for classification with imbalanced data sets: proposals and taxonomy[J]. Evolutionary Computation, 2009, 17(3):275-306.
  • 6YEN S J, LEE Y S. Cluster-based under-sampling approaches for imbalanced data distributions[J]. Expert Systems with Applications, 2009, 36(3):5718-5727.
  • 7WU J, XIONG H, WU P, et al. Local decomposition for rare class analysis[J]. Kdd, 2007, 20(2):191-220.
  • 8BLASZCZYNSKI J, STEFANOWSKI J. Neighbourhood sampling in bagging for imbalanced data[J]. Neurocomputing, 2015, 150:529-542.
  • 9KAI M T. An instance-weighting method to induce cost-sensitive trees[J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(3):659-665.
  • 10ZHANG S. Decision tree classifiers sensitive to heterogeneous costs[J]. Journal of Systems and Software, 2012, 85(4):771-779.

二级参考文献64

  • 1张启蕊,张凌,董守斌,谭景华.训练集类别分布对文本分类的影响[J].清华大学学报(自然科学版),2005,45(S1):1802-1805. 被引量:26
  • 2Kotsiantis S,Kanellopoulos D,Pintelas P.Handling Imbalanced Datasets:A Review.GESTS International Trans on Computer Science and Engineering,2006,30(1):25-36.
  • 3Burez J,van den Poel D.Handling Class Imbalance in Customer Churn Prediction.Expert Systems with Applications,2009,36(3):4626-4636.
  • 4Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-Sampling Technique.Journal of Artificial Intelligence Research,2002,16(1):321-357.
  • 5Han Hui,Wang Wenyuan,Mao Binghuan.Borderline-SMOTE:A New Over-Sampling Method in Imbalanced Data Sets Learning // Proc of the International Conference on Intelligent Computing.Hefei,China,2005:878-887.
  • 6Guo Hongyu,Viktor H L.Learning from Imbalanced Data Sets with Boosting and Data Generation:the DataBoost-IM Approach.ACM SIGKDD Explorations Newsletter,2004,6(1):30-39.
  • 7Chawla N V,Lazarevic A,Hall L O,et al.SMOTEBoost:Improving Prediction of the Minority Class in Boosting // Proc of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases.Dubrovnik,Croatia,2003:107-119.
  • 8Garcìa S,Herrera F.Evolutionary Undersampling for Classification with Imbalanced Datasets:Proposals and Taxonomy.Evolutionary Computation,2009,17(3):275-306.
  • 9Joshi M V,Kumar V,Agarwal R.Evaluating Boosting Algorithms to Classify Rare Classes:Comparison and Improvements // Proc of the 1st IEEE International Conference on Data Mining.San Jose,USA,2001:257-264.
  • 10Cieslak D A,Chawla N V.Learning Decision Trees for Unbalanced Data // Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases.Antwerp,Belgium,2008:241-256.

共引文献37

同被引文献12

引证文献5

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部