期刊文献+

属性频率划分和信息熵离散化的决策树算法 被引量:4

Decision tree algorithm using attribute frequency splitting and information entropy discretization
下载PDF
导出
摘要 决策树是数据挖掘任务中分类的常用方法。在构造决策树的过程中,节点划分属性选择的度量直接影响决策树分类的效果。基于粗糙集的属性频率函数方法度量属性重要性,并用于分枝划分属性的选择和决策树的预剪枝,提出一种决策树学习算法。同时,为了能处理数值型属性,利用数据集的统计性质为启发式知识,提出了一种改进的数值型属性信息熵离散化算法。实验结果表明,新的离散化方法计算效率有明显提高,新的决策树算法与基于信息熵的决策树算法相比较,结构简单,且能有效提高分类效果。 Decision tree is a usual method of classification in data mining.In the process of constructing a decision tree,the criteria of selecting partition attributes will influence the efficiency of classification.Based on the concept of attributes importance metric that is measured by a function of attribute frequency in Rough Set theory,and which is used to select the partition attribute and pre-prune the decision tree,a new decision tree algorithm is proposed.In addition,using the heuristics information of data set statistic property,a novel algorithm of diseretization of numerical attributes based on information entropy is proposed.The results of experiment show that the new discretization algorithm can improve the efficiency of computation,the new decision tree algorithm is simpler in the structure,and has better performances of classification than the entropy-based method.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第12期153-156,共4页 Computer Engineering and Applications
基金 广西省自然科学基金No.0481016 广西省教育厅科研基金No.2006149~~
关键词 决策树 粗糙集 属性频率 信息熵 离散化 decision tree rough set attribute frequency information entropy discretization
  • 相关文献

参考文献7

二级参考文献40

  • 1吴成东,许可,王欣,韩中华.软计算方法在数据挖掘中的应用[J].计算机测量与控制,2005,13(3):294-297. 被引量:8
  • 2[1]Quinlan JR. C4.5: Programs for Machine Learning [M]. San Mateo, CA: Morgan Kaufmann, 1993.
  • 3[2]Liu B, Hsu W, Ma Y. Intergrating Classification and Association Rule Mining [A]. Proc KDD[C], 1998.
  • 4[3]Buntine WL, Weigend AS. Computing Second Derivatives in Feed-forward Networks: A Review [J]. IEEE Transactions on Neural Networks, 1991,5(3):480-488.
  • 5[4]Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines [M]. Cambridge Press, 2000. 1-18.
  • 6[5]Pawlak ZW. Rough Sets [J]. International Journal of Information and Computer Science, 1982,11(5):341-356.
  • 7[6]Pawlak ZW. Rough Sets and Intelligent Data Analysis [J]. Information Sciences, 2002,147(1-4):1-12.
  • 8[7]张文修,吴伟志,梁吉业. 粗糙集理论及方法 [M]. 北京:科学出版社,2003. 1-25.
  • 9[9]Beynon M. Reducts within the Variable Precision Rough Set Model: A Further Investigation [J]. European Journal of Operational Research, 2001, 134: 592-605.
  • 10[10]Murphy P, Aha W. UCI Repository of Machine Learning Databases [DB/OL]. http://www.ics.uci.edu/~mlearn/MLRepository.html, 1996.

共引文献177

同被引文献48

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部