摘要
测试属性的选择是决策树构建的关键。本文基于单因素方差分析原理,提出了决策树算法ANOVA1.0及ANOVA2.0。两种算法在测试属性的选择上分别采用最大组间平方和、最大组内平方和增益率,而且都在平台WEKA-3-5上实现。与ID3、C4.5进行效率、精度等方面比较的大数据集实验结果表明,提出的两种算法是较好的分类算法。
Two new decision tree algorithms, ANOVA1.0 and ANOVA2.0, are presented in this paper. The algorithrns are based on one-way analysis of variance. ANOVA1. 0 selects tested attributes according to the biggest sum of squares between groups. ANOVA2.0 selects the tested attributes according to the biggest intergroup gain ratio of sum of squares. ANOVA1.0 and ANOVA2.0 are implemented in the Weka-3-5 software. The two given algorithms are compared to ID3 and CA. 5 in performance, precision,and so on. The experiments with larger datasets are done and the experimental re- sults show that ANOVA1.0 and ANOVA2. 0 are better classification algorithms.
出处
《计算机工程与科学》
CSCD
2007年第10期50-53,共4页
Computer Engineering & Science
关键词
决策树
组间平方和
组内平方和增益率
decision tree
intergroup sum of. squares
intra-group gain ratio of sum of squares