摘要
对传统C4.5算法的运算效率和属性选择准确性进行研究,对其进行改进。运用泰勒级数和等价无穷小的原理对算法的计算公式进行简化,提高运算效率;在简化后的信息增益率计算公式中引入其它非类属性对于该属性的GINI指数的均值,用于调整因非类属性间冗余度问题导致的误差,提高算法属性选择的准确性,将改进后的算法称为G_C4.5。对G_C4.5、传统C4.5算法与其它改进算法进行对比实验分析,分析结果表明,G_C4.5算法在分类效率和准确性上都有一定提高。
After researching the computing efficiency and attribute selection accuracy of traditional C4.5algorithm,some improvements were implemented.The calculation formula was simplified using the principle of Taylor series and equivalent infinitesimal,the efficiency of calculation was improved.The average value of GINI index of non-class attributes for this attribute was introduced to the simplified formula of information gain rate,the deviation caused by the redundancy between non-class attributes was adjusted,and the accuracy of the attribute selection was improved.The improved algorithm was named as G_C4.5.G_C4.5algorithm was contrasted with traditional C4.5algorithm and its other improved algorithms,results show that G_C4.5algorithm improves the classification efficiency and the classification accuracy.
出处
《计算机工程与设计》
北大核心
2016年第5期1265-1270,1361,共7页
Computer Engineering and Design