期刊文献+

贝叶斯的决策树剪枝算法在学科评审中的研究 被引量:3

Research of post-pruning decision tree algorithm based on Bayesian theory in discipline evaluation
下载PDF
导出
摘要 为了解决决策树C4.5算法生成的决策树过度拟合训练样本,泛化能力低的问题,提出了一种基于贝叶斯理论的决策树后剪枝算法。该算法利用贝叶斯后验定律,对决策树C4.5算法所生成的决策树的每一个分枝进行验证,将不能满足条件的分枝从该决策树中去除,生成一个简单的树。对北京市重点学科信息平台和硕、博授予点平台提供的学科历史审批数据进行实验验证。实验结果表明,该算法可以剪掉大多数不可靠分枝和过拟合分枝,较决策树C4.5算法对新数据的分类有着更高的预测精度。 To solve the problems that the decision tree created by CA. 5 algorithm keeps overfiting the training data and has weak generalization ability, a post-pruning decision tree algorithm based on Bayesian theory is proposed. Bayesian theorem is used to validate the each branch of the decision tree. And then those branches that do not meet the conditions are removed from the decision tree. Finally a simple decision tree is generated. The proposed algorithm is verified by the data provided by the Beijing key disciplines platform and the Beijing Master and Dotter platforrru The experimental result shows that the algorithm can cut the most unreliable and overfiting branches. And compared with the CA. 5 algorithm, the proposed algorithm has a higher prediction aeeuraey and a broader coverage to classify the new data.
出处 《计算机工程与设计》 CSCD 北大核心 2013年第11期3873-3877,共5页 Computer Engineering and Design
关键词 决策树CA 5 后剪枝 数据挖掘 决策支持 decision tree C4. 5 post-pruning~ data mining decision support
  • 相关文献

参考文献12

  • 1杨学兵,张俊.决策树算法及其核心技术[J].计算机技术与发展,2007,17(1):43-45. 被引量:85
  • 2M K, Han J. Data miming concepts and techniques [M]. Beijing: China Machine Press, 2007.
  • 3Zhou Z, Yang B, HouW. Association classification algorithm based on structure sequence in protein secondary structure prediction [J]. Expert Systems with Applications, 2010, 37 (9): 6381-6389.
  • 4Weiss S M, Indurkhya N. Decision tree pruning: Biased or optimal [C] //Seattle, WA, USA: AAAI, 1994.
  • 5Osei-Bryson K. Post-pruning in decision tree induction using multiple performance measures [J]. Computers and Operations Research, 2007, 34 (11): 3331-3345.
  • 6Zhang Y. Chi Z. Wang D. Decision tree's pruning algorithm based on deficient data sets [C] //Dalian, China: Institute of Electrical and Electronics Engineers Computer Society, 2005.
  • 7Akakpo N. Estimating a discrete distribution via histogram selection [J]. ESAIM-Probability and Statistics, 2011, 15: 1-29.
  • 8Setsiriehok D. Classification of complete blood count and hae- moglobin typing data by a C4. 5 decision tree, a naive Bayes classifier and a multilayer perceptron for thalassaemia screening [J]. Biomedical Signal Processing and Control, 2012, 7 (2): 202-212.
  • 9Cheng J, Chen H, I.in Y. A hybrid forecast marketing timing model based on probabilistic neural network, rough set and C4.5 [J]. Expert Systems with Applications, 2010, 37 (3): 1814-1820.
  • 10Cai Z. Identifying product failure rate based on a conditional Bayesian network classifier [J]. Expert Systems with Applications, 2011, 38 (5): 5036-5043.

二级参考文献7

  • 1HanJiawei KamberM.Data Mining Concepts and Techniques[M].北京:机械工业出版社,2001..
  • 2Mitchell T M.机器学习[M].北京:机械工业出版社,2004.
  • 3Quinlan J R.Induction of Decision Tree[J].Machine Learning,1986(1):81-106.
  • 4Quinlan J R.C4.5:Programs for Machine Learning[M].[s.l.]:Morgan Kaufman,1993.
  • 5Mehta M,Agrawal R,Rissanen J.SLIQ:A Fast and Scalable Classifier for Data Mining[M].US:IBM Almaden Research Center,1996.
  • 6Shafer J C,Agrawal R,Mehta M.SPRINT:A Scalable Parallel Classifier for Data Mining[C]//Proc of the 22nd Int Conf on Very Large Databases.Mumbai(Bombay),India:[s.n.],1996.
  • 7Rastogi R,Shim K.PUBLIC:A Decision Tree Classifier that Integrates Building and Pruning[R].Murray Hill:Bell Laboratories,1998.

共引文献84

同被引文献37

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部