期刊文献+

一种新的决策树分裂属性选择方法 被引量:6

A New Splitting Criterion of Decision Trees
下载PDF
导出
摘要 分类问题是数据挖掘和机器学习中的一个核心问题。为了得到最大程度的分类准确率,决策树分类过程中,非常关键的是结点分裂属性的选择。常见的分裂结点属性选择方法可以分为信息熵方法、GINI系数方法等。分析了目前常见的选择分裂属性方法——基于信息熵方法的优、缺点,提出了基于卡方检验的决策树分裂属性的选择方法,用真实例子和设置模拟实验说明了文中算法的优越性。实验结果显示文中算法在分类错误率方面好于以信息熵为基础的方法。 Classlfication is an important issue on data mining and machine learning. Selecting splitting attributes is the key process during constructing decision tree for rcceiving the maximized classification accuracy. Existing methods for classification usually can be the method based on entroy, GINI index, and so on. Analyses the disadvantages and the advantages of the method which is utilized to select splitting attributes based on information gain theory, and proposes a statistical method which employs chi - squared test to get the relation between the condition attributes and the class label. Demonstrate experimental this algorithm and the results show this method is significantly well than the methods based on information theory.
作者 刘星毅
机构地区 钦州学院
出处 《计算机技术与发展》 2008年第5期70-72,共3页 Computer Technology and Development
基金 广西自然科学基金(桂科0640069)
关键词 决策树 分裂属性 卡方检验 信息熵 decision trees splitting attributes Chi-squared test information entropy
  • 相关文献

参考文献5

  • 1MITCHELL TM.机器学习[M].曾华军,张银奎.北京:机械工业出版社,2003.
  • 2Quinlan J R. Induction of decision trees[J]. Machine learning, 1986(1 ) :81 - 106.
  • 3Ouinlan J R. C4.5: program for machine learning[M]. New York, US: Morgan Kaufmann, 1993.
  • 4Hunt E B, Matin J, Stone P T. Experiments in Induction [M]. New York, US:Acadenfic Press,1966.
  • 5Blade C L, Merz C J. UCI repository of machine learning databases(website)[D]. Irvine, CA: University of Califomia, Department of Information and Computer Science,1998.

共引文献29

同被引文献54

引证文献6

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部