期刊文献+

基于单因素方差分析的决策树算法 被引量:1

Decision Tree Algorithms Based on a One-Way Analysis of Variance
下载PDF
导出
摘要 测试属性的选择是决策树构建的关键。本文基于单因素方差分析原理,提出了决策树算法ANOVA1.0及ANOVA2.0。两种算法在测试属性的选择上分别采用最大组间平方和、最大组内平方和增益率,而且都在平台WEKA-3-5上实现。与ID3、C4.5进行效率、精度等方面比较的大数据集实验结果表明,提出的两种算法是较好的分类算法。 Two new decision tree algorithms, ANOVA1.0 and ANOVA2.0, are presented in this paper. The algorithrns are based on one-way analysis of variance. ANOVA1. 0 selects tested attributes according to the biggest sum of squares between groups. ANOVA2.0 selects the tested attributes according to the biggest intergroup gain ratio of sum of squares. ANOVA1.0 and ANOVA2.0 are implemented in the Weka-3-5 software. The two given algorithms are compared to ID3 and CA. 5 in performance, precision,and so on. The experiments with larger datasets are done and the experimental re- sults show that ANOVA1.0 and ANOVA2. 0 are better classification algorithms.
出处 《计算机工程与科学》 CSCD 2007年第10期50-53,共4页 Computer Engineering & Science
关键词 决策树 组间平方和 组内平方和增益率 decision tree intergroup sum of. squares intra-group gain ratio of sum of squares
  • 相关文献

参考文献7

  • 1Quinlan J R. Induction of Decision Tree[J]. Machine Learning, 1986,1(1):81-106.
  • 2Quinlan J R. CA. 5:Programs for Machine Learning[M]. San Mateo: Morgan Kaufmann Publishers, 1993.
  • 3Kononenko I. On Biases in Estimating Multi-Valued Attributes[A]. Proe of the 14th Int'l Joint Conf on Artificial Intelligence[C]. 1995. 1034-1040.
  • 4Ho T, Nguyen T. Evaluation of Attribute Selection Measures in Decision Tree Induction[A]. Proc of the 9th Int'l Conf on LEA/ALE[C]. 1996. 413-418.
  • 5Boutsinas B, Tsekouronas I X. Splitting Data in Decision Trees Using the New False-Positives Methods and Applications of Artificial Intelligence[A]. Proc of the 3rd Hellenic Conf on AI[C]. 2004. 174-182.
  • 6Han Jiawei,Kamber M.数据挖掘:概念与技术[M].北京:机械工业出版社,2001.
  • 7Witten H,Frank E.数据挖掘:实用机器学习技术.第二版[M].董琳,等译.北京:机械工业出版社,2006.

共引文献40

同被引文献12

  • 1刘鹏,姚正,尹俊杰.一种有效的C4.5改进模型[J].清华大学学报(自然科学版),2006,46(z1):996-1001. 被引量:28
  • 2栾丽华,吉根林.决策树分类技术研究[J].计算机工程,2004,30(9):94-96. 被引量:110
  • 3刘东升.基于Mobile Agent的分布式ID3挖掘模型[J].计算机应用与软件,2005,22(10):49-51. 被引量:3
  • 4陈娜.数据挖掘技术的研究现状及发展方向[J].电脑与信息技术,2006,14(1):46-49. 被引量:30
  • 5杨学兵,张俊.决策树算法及其核心技术[J].计算机技术与发展,2007,17(1):43-45. 被引量:84
  • 6Han Jiawei,Kamber Micheline,范明,孟小峰,等译.数据挖掘概念与技术[M].北京:机械工业出版社,2007:424-479.
  • 7RAILEANU L E, K. STOFFEL. Theoretical comparision between the Gini Index and Information Gain criteria[J]. Annals of Mathematics and Artificial Intelligence , 2004,41:77- 93.
  • 8QUINLAN J R. CA. 5: Programs for machine learning[M]. San Francisco, CA: Morgan Kaufmarm Publishers Inc, 1993.
  • 9GADDAM S R, PHOHA H H, BALAGANI K S. K-Means+ ID3: a novel method for supervised anomaly detection by cascading K - Means clustering and ID3 decision tree learning methods[ J ]. IEEE Transactions, 2007,19 (3) : 345 - 354.
  • 10BLAKE C, MERZ C. UCI repository of machine learning databases[EB/OL]. (1998- 09- 14)[2010- 08 - 12]. http: ffwww, its. uci. edu/- mlearn/MLRepository, html.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部