摘要
在分析C4.5算法原理的基础上,进一步讨论了C4.5算法在决策树的规模控制、属性选择、滤躁和去除不相关属性等方面的不足,讨论了决策树挖掘中对训练数据进行属性约简的必要性。从实用的角度提出了一种利用遗传算法进行寻优的、基于属性约简的决策树构建模型,并为此模型设计了一个适应度函数。该模型具有自适应的特点,通过调整适应度函数的参数,可以约束遗传算法的寻优方向,实现对决策树的优化。实验表明,决策树寻优后,在所用训练集属性减少的同时,分类精度却有一定程度的提高,而分类规则的规模却降低了,因此,该模型具有一定的实用价值。
Based on the analysis of C4.5 algorithm, presents the defects of the scale control of decision tree and attribute selection,and in eliminating noise and irrelevant attributes. The paper also discusses the necessity of conducting attribute reduction for the training data in the course of decision tree mining. In addition, for the practical demands, the paper, based on attribute reduction, proposes a model for decision tree to optimize it by adopting genetic algorithm. Then a fitness function is designed for the model. The model maintains the characteristic of self- adjustment,can control the optimization direction of genetic algorithm,and optimize the decision tree by adjusting the parameters of fitness function. An experiment is conducted and the findings of the experiment show that after the optimization of the decision tree, the attributes of training data will be reduced, the classification accuracy will be improved and the scale of the classification rules will be made smaller. Therefore, the model is of great practical value.
出处
《计算机技术与发展》
2007年第3期116-118,共3页
Computer Technology and Development
关键词
决策村
属性约简
遗传算法
适应度函数
decision tree
attribute reduction
genetic algorithm
fitness function