摘要
为了解决决策树C4.5算法生成的决策树过度拟合训练样本,泛化能力低的问题,提出了一种基于贝叶斯理论的决策树后剪枝算法。该算法利用贝叶斯后验定律,对决策树C4.5算法所生成的决策树的每一个分枝进行验证,将不能满足条件的分枝从该决策树中去除,生成一个简单的树。对北京市重点学科信息平台和硕、博授予点平台提供的学科历史审批数据进行实验验证。实验结果表明,该算法可以剪掉大多数不可靠分枝和过拟合分枝,较决策树C4.5算法对新数据的分类有着更高的预测精度。
To solve the problems that the decision tree created by CA. 5 algorithm keeps overfiting the training data and has weak generalization ability, a post-pruning decision tree algorithm based on Bayesian theory is proposed. Bayesian theorem is used to validate the each branch of the decision tree. And then those branches that do not meet the conditions are removed from the decision tree. Finally a simple decision tree is generated. The proposed algorithm is verified by the data provided by the Beijing key disciplines platform and the Beijing Master and Dotter platforrru The experimental result shows that the algorithm can cut the most unreliable and overfiting branches. And compared with the CA. 5 algorithm, the proposed algorithm has a higher prediction aeeuraey and a broader coverage to classify the new data.
出处
《计算机工程与设计》
CSCD
北大核心
2013年第11期3873-3877,共5页
Computer Engineering and Design
关键词
决策树CA
5
后剪枝
数据挖掘
决策支持
decision tree C4. 5
post-pruning~ data mining
decision support