摘要
针对C4.5算法繁多的对数运算、无关属性地干扰和属性相关性的影响等问题,提出了基于属性依赖度计算和主成分分析(PCA)的C4.5算法。根据等价无穷小的原理简化运算公式;用属性依赖度的计算并借鉴PCA算法的压缩原理来处理属性相关性问题;引入了"平均波动率"和"应用权重"两个新概念,得到一个新的属性选择度量。用学生综合成绩的评定工作进行应用分析,并用UCI数据集进行性能比较,实验表明:改进后算法的评定结果更科学,并且分类更准确,运算效率更高。
Aiming at problem of C4.5 algorithm such as huge amount of logarithm operations, irrelevant attributes interference and attribute correlation effect, propose C4. 5 algorithm based on attribute dependency calculation and PCA. There are some enhancement strategies which includes simplified calculation formula according to principle of equivalent infinitesimal, deal with problem of attribute correlation through calculation of the dependency for attribute and reference the compression principle of principal component analysis(PCA) algorithm. While introduce two new concepts," average volatility" and" application weight" to get a new metric of attribute selection. With the evaluation work of the students' comprehensive performance for application analysis, and use UCI data sets to compare performance. Experimental results show that the improved algorithm evaluation results are more scientific, more accurate and higher computing efficiency than before.
出处
《传感器与微系统》
CSCD
2017年第1期131-134,共4页
Transducer and Microsystem Technologies
关键词
C4.5算法
属性依赖度计算
主成分分析
平均波动率
应用权重
C4.5 algorithm
calculation of dependency for attribute
principal component analysis (PCA)
average volatility
application weight