摘要
介绍了一种一般情况下的C4.5数据挖掘算法的优化方法。原来的C4.5算法在计算属性信息增益率时需要大量用到对数运算,而优化后的C4.5算法计算属性信息增益率时只需用到加减乘除运算,在实现时不用频繁调用对数函数,优化后的算法不会改变属性信息增益率的排序,不改变生成的决策树。改进后的算法能做到在不改变准确率和不增加空间复杂度的情况下,减少时间复杂度,提高了决策树生成效率。
A kind of optimization method of data mining algorithm C4.5 that is applicable to the general case is introduced in this paper. The original algorithm C4.5 need to extensively use logarithmic operation while calculating the attribute information gain ratio, but the optimized algorithm C4.5 only uses adding, subtracting, multiplying and dividing operation when calculating the attribute information gain ratio. Thus, it does not need to frequently call logarithmic function when programming. The optimized algorithm doesn't change the attribute information gain ratio ranking and it doesn't change the generated decision tree. The optimized C4.5 algorithm can reduce the time complexity, improve the efficiency of the generation of decision tree without changing accuracy and increasing time complexity at the same time.
出处
《三明学院学报》
2013年第2期21-26,共6页
Journal of Sanming University
基金
福建省自然科学基金项目(2012J1283)
福建省教育厅省属高校科研专项计划项目(JK2012051)
三明市科技局重点项目(2011-G-4)
关键词
数据挖掘
算法
优化
data mining
algorithm
optimizeation