期刊文献+

基于Rough集的决策树算法 被引量:9

Decision Tree Algorithm Based on Rough Set
下载PDF
导出
摘要  针对基于Rough集的经典分类算法值约简算法等不适合大数据集的问题,提出了基于Rough集的决策树算法.采用一个新的选择属性的测度——属性分类粗糙度作为选择属性的启发式,该测度较Rough中刻画属性相关性的测度正区域等更为全面地刻画了属性分类综合贡献能力,并且比信息增益和信息增益率的计算更为简单采取了一种新的剪枝方法——预剪枝,即在选择属性计算前基于变精度正区域修正属性对数据的初始划分模式, 以更有效地消除噪音数据对选择属性和生成叶节点的影响.采取了一种与决策树算法高度融合的简单有效的检测和处理不相容数据的方法,从而使算法对相容和不相容数据都能进行有效处理.对UCI机器学习数据库中几个数据集的挖掘结果表明,该算法生成的决策树较ID3算法小,与用信息增益率作为启发式的决策树算法生成的决策树规模相当.算法生成所有叶节点均满足给定最小置信度和支持度的决策树或分类规则,并易于利用数据库技术实现,适合大数据集. For the problem that classical classification algorithms such as value reduction algorithm based on Rough set are not suitable for large data sets, this paper proposes a decision tree algorithm based on Rough set. The algorithm takes a novel measure--attribute classification rough degree as the heuristic of choosing attribute at a tree node, which more synthetically measures contribution of an attribute for classification than other measures in Rough set and is simpler in calculation than information gain and information gain ratio. The algorithm adopts a new pruning method,predictive pruning, which makes use of variable precision positive a^as to revise the partition pattern of attribute to the data set at a tree node before the calculation of choosing attribute, thus more effectively eliminating the effect of noise data on choosing attributes and generating leaf nodes. The algorithm takes a simple and efficient method to deal with inconsistent data, which is highly merged with decision tree algorithm, hence it can deal with both consistent and inconsistent data efficiently. The mining results of 6 data sets of UCI machine learning repository- show that the size of trees generated by the algorithm is smaller than that by ID3, and is at the same scale as that generated by the decision tree algorithm using information gain ratio as heuristic. The algorithm can directly generate decision trees or classification rule sets and is easy to realize by database technology, which makes it suitable for large data sets.
作者 乔梅 韩文秀
出处 《天津大学学报(自然科学与工程技术版)》 EI CAS CSCD 北大核心 2005年第9期842-846,共5页 Journal of Tianjin University:Science and Technology
基金 天津市教委高校科技发展基金资助项目(020714)天津理工大学科技发展基金研究资助项目(LG030291)
关键词 ROUGH集 决策树 属性分类粗糙度 预剪枝 不相容数据 Rough set decision tree attribute classification rough degree predictive pruning inconsistent data
  • 相关文献

参考文献11

  • 1Pawlak Z. Rough sets [ J ]. International Journal of Computer and Information Science, 1982, 11 (5) : 341-356.
  • 2Hu X H, Cercone N. Learning in relational databases: A rough set approach [ J ]. International Journal of Computational Intelligence, 1995, 11 (2) : 323-338.
  • 3Skowron A. Extracting laws from decision table:A Rough set approach [ J ]. Computational Intelligence, 1995, 11 ( 2 ) :371-388.
  • 4Bleyberg M Z, Elumalai A. Using Rough sets to construct sense type decision trees for text categorization [ A ]. In:IFSA World Congress and 20th NAFIPS International Conference[ C ]. Vancouver BC :2001.
  • 5Wei Jinmao, Huang Dao, Wang Shuqin, et al. Rough set based decision tree [A ]. In : Proceedings of the 4th World Congress on Intelligent Control and Automation [ C ]. Shanghai, China :2002. 426-431.
  • 6Quinlan J R. Induction of decision trees[ J]. Machine Learning, 1986, 1 (1) : 81-106.
  • 7Quinlan J R. C4. 5 : Programs for Machine Learning[ M ].Sanmateo: Morgan Kaufmann, 1993.
  • 8Esposito F, Malerba D, Semeraro G. A comparative analysis of methods for pruning decision trees [ J ]. IEEE Trans Patten Analysis and Machine Intelligence, 1997,19 ( 5 ) : 476-491.
  • 9Ziarko W. Variable precision Rough set model [ J ]. Journal of Computer and Sciences, 1993, 46( 1 ) :39-59.
  • 10Ziarko Wojciech. Probabilistic decision tables in the variable precision Rough set model [ J ]. Computational Intelligerwe, 2001, 17(3): 593-603.

同被引文献91

引证文献9

二级引证文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部