摘要
决策树算法可以对数据集进行有效的训练学习和快速准确的分类,其中ID3算法是最早提出的一种决策树算法,但是,此算法只适用于处理取值较多属性的数据,不能处理连续数据,对噪声也比较敏感。C4.5算法是对ID3算法的优化,不仅可以对连续值属性进行处理,而且增加了对空值数据的处理功能。在研究和分析主流决策树算法基础上,针对二手汽车数据库在Weka数据挖掘平台进行了C4.5算法的设计与实现。实验结果表明该算法对预测数据集中的相应属性能进行较为准确的预测。
Decision tree algorithm can do effective training and learning as well as fast accurate classification to dataset. ID3 algorithm is the earliest decision tree algorithm. But this algorithm can only be applied to handle more attribute data values, and continuous data can't be solved efficiently. It is also sensitive to noise. C4.5 algorithm is the optimization of ID3 algorithm. It can not only solve the continuous attribute values, but also increase the function of empty data. This paper mainly uses Weka data mining tools to do the design and realization of C4.5 algorithm, which is based on an example of Second-hand car database.This experiment indicates that those concentrated values can be predicted accurately by this algorithm.
出处
《微型电脑应用》
2015年第6期63-65,共3页
Microcomputer Applications