期刊文献+

ConfDTree: A Statistical Method for Improving Decision Trees 被引量:3

ConfDTree: A Statistical Method for Improving Decision Trees
原文传递
导出
摘要 Decision trees have three main disadvantages: reduced performance when the training set is small; rigid decision criteria; and the fact that a single "uncharacteristic" attribute might "derail" the classification process. In this paper we present ConfDTree (Confidence-Based Decision Tree) -- a post-processing method that enables decision trees to better classify outlier instances. This method, which can be applied to any decision tree algorithm, uses easy-to-implement statistical methods (confidence intervals and two-proportion tests) in order to identify hard-to-classify instances and to propose alternative routes. The experimental study indicates that the proposed post-processing method consistently and significantly improves the predictive performance of decision trees, particularly for small, imbalanced or multi-class datasets in which an average improvement of 5%-9% in the AUC performance is reported. Decision trees have three main disadvantages: reduced performance when the training set is small; rigid decision criteria; and the fact that a single "uncharacteristic" attribute might "derail" the classification process. In this paper we present ConfDTree (Confidence-Based Decision Tree) -- a post-processing method that enables decision trees to better classify outlier instances. This method, which can be applied to any decision tree algorithm, uses easy-to-implement statistical methods (confidence intervals and two-proportion tests) in order to identify hard-to-classify instances and to propose alternative routes. The experimental study indicates that the proposed post-processing method consistently and significantly improves the predictive performance of decision trees, particularly for small, imbalanced or multi-class datasets in which an average improvement of 5%-9% in the AUC performance is reported.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第3期392-407,共16页 计算机科学技术学报(英文版)
关键词 decision tree confidence interval imbalanced dataset decision tree, confidence interval, imbalanced dataset
  • 相关文献

参考文献36

  • 1Rokach L, Maimon O. Data Mining with Decision Trees: The- ory and Applications. World Scientific Publishing, 2008.
  • 2Quinlan J R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
  • 3Chawla N V, Japkowicz N, Kotcz A. Editorial: Special is- sue on learning from imbalanced data sets. SIGKDD Explor. Newsl., 2004, 6(1): 1-6.
  • 4Provost F, Domingos P. Well-trained PETs: Improving prob- ability estimation trees. Technical Report, CDER #00-04- IS, Stern School of Business, New York University, 2001. http: //pages.stern.nyu.edu/ fprovost/Papers/pet-wp.pdf, Mar. 2014.
  • 5Lin H Y. Efficient classifiers for multi-class classification prob- lems. Decision Support Systems, 2012, 53(3): 473-481.
  • 6Breiman L. Random forests. Machine Learning, 2001, 45(1): 5-32.
  • 7Van Assche A, Blockeel H. Seeing the forest through the trees: Learning a comprehensible model from an ensemble. In Proc. the 18th European Conf. Machine Learning, Sept. 2007, pp.418-429.
  • 8Katz G, Shabtai A, Rokach L, Ofek N. ConfDTree: Improv- ing decision trees using confidence intervals. In Proc. the 12th Int. Conf. Data Mining (ICDM), Dec. 2012, pp.339-348.
  • 9Quinlan J R. Induction of decision trees. Machine Learning, 1986, 1(1): 81-106.
  • 10Quinlan J R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

同被引文献30

  • 1蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 2赵志滨,贾岩峰,姚兰,鲍玉斌.含有丰富结构化数据的Web页面分类技术的研究[J].计算机研究与发展,2013,50(S1):53-60. 被引量:5
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:384
  • 4TomMMitchell.机器学习[M].北京:机械工业出版社,2003..
  • 5Xu B G. Intelligent fault inference for rotating flexible rotors using Bayesian belief network [ J ]. Expert Systems with Applications, 2012,39 ( 1 ) : 816 -822.
  • 6ADEL S E, ZEYNEP O, ADNAN M A B. A new feature selection model based on ID3 and bees algorithm for intrusion detection system[J]. Turkish Journal of Electrical Engineering and Comput- er Sciences, 2015, 23(2): 615-622.
  • 7JIN CHENXIA,LI FACHAO, LI YAN. A generalized fuzzy ID3 algorithm using generalized information entropy[J]. Knowledgebased Systems, 2014, 64(7): 13-21.
  • 8NIYANTA D, ANKIT K. Comparison of ID3 and CART-ANFIS approach for play-tennis data[C]. International Conference on Data Mining and Intelligent Computing,2014.
  • 9LI JUFANG, LEI JINHUI, ZHAO XIAOXIA, et al. An improved ID3 algorithm[C]. 2nd International Conference on Advances in Computational Modeling and Simulation, 2014 : 723-727.
  • 10SRINIVASAN V, RAJENDERAN G, KUZHALI J V, et al. Fuzzy fast classification algorithm with hybrid of ID3 and SVM[J]. Journal of Intelligent & Fuzzy Systems, 2013, 24(3): 555-561.

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部