期刊文献+

基于决策树生成及剪枝的数据集优化及其应用 被引量:14

Optimization of data set and its application based on construction and pruning of decision tree
下载PDF
导出
摘要 为提高智能模型的识别精度,增强其泛化能力,需要对用于智能建模的数据集中的对象类别异常进行检测和修正。在进行数据集和决策树形式化描述的基础上,将基尼指数增益率作为确定连续条件属性最优二分原则,采用递归算法生成叶节点中对象为同一类别的二叉决策树。利用信息熵评价决策树剪除叶节点中对象的类别分布效果,实现数据集类别异常的类别修正。决策树的生成和剪枝本质上是完成基于基尼指数和信息熵的连续条件属性数据空间分割和合并类别修正。实验和实际应用验证了决策树生成和剪枝是数据集类别优化的有效方法。 To improve the recognition accuracy and the generation ability of intelligent models,it is necessary to detect and revise the abnormality of objects in the dataset used to construct the intelligent models.On the basis of the formal description of dataset and decision tree,the Gini-index gain was used as bisection criterion for continuous condition attributes,and the construction of binary decision tree was based on the recursive algorithm,all the objects in all whose leaf nodes had same labels.The information entropy was applied to evaluate the distribution of objects by their labels in the leaf nodes of pruned decision tree,to implement the revision of the abnormal labels of objects.In nature,the construction and pruning of decision tree were the division and merging of continuous data space of condition attributes using Gini-index and information entropy to revise the objects' labels.All the experiments and applications verify that the construction and pruning of decision tree are effective,which are successful methods for optimization of the objects' labels.
出处 《计算机工程与设计》 北大核心 2018年第1期205-211,共7页 Computer Engineering and Design
基金 国家863高技术研究发展计划基金项目(2009AA062802) 国家自然科学基金项目(60473125) 中国石油(CNPC)石油科技中青年创新基金项目(05E7013) 国家重大专项子课题基金项目(G5800-08-ZS-WX) 中国石油大学(北京)克拉玛依校区科研启动基金项目(RCYJ2016B-03-001)
关键词 信息熵 基尼指数 决策树 剪枝 数据优化 information entropy Gini index decision tree tree pruning data optimization
  • 相关文献

参考文献9

二级参考文献144

共引文献387

同被引文献137

引证文献14

二级引证文献60

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部