期刊文献+

缓解随机一致性的基尼指数与决策树方法

Gini index and decision tree method with mitigating random consistency
原文传递
导出
摘要 决策树模型具有较强的可解释性,是随机森林、深度森林等机器学习方法的基础.如何选择节点的分割属性与分割值是决策树算法的关键问题,对树的泛化能力、深度、平衡程度等重要性能产生影响.传统属性选择准则的定义大多基于凹函数,使得决策树算法存在多值偏向问题,即倾向于选择取值种类多的属性作为节点分割属性.已有研究表明缓解随机一致性的评价准则能够降低分类偏差与类簇个数偏向.本文将基于标准化框架缓解基尼指数的随机一致性,以此缓解其多值偏向问题.通过人造数据集验证,标准基尼指数能够缓解基尼指数的多值偏向问题,并且选择出具有决策信息的属性.通过12个基准数据集与两个图像数据集的实验验证,基于标准基尼指数的决策树算法比现有缓解多值偏向的决策树算法具有较高的泛化性能. The decision tree model has strong interpretability and is the basis of machine learning methods such as random forest and deep forest.Selecting the segmentation attribute and segmentation value of nodes is the core problem of the decision tree method,which has an impact on the generalization ability,depth,balance degree,and other important performance aspects of the tree.Most of the traditional node selection attribute criteria are defined based on the sum of concave functions,which makes the decision tree algorithm have the problem of multivalue bias;that is,it tends to select the attribute with many values as the node segmentation attribute.In the classification task,the performance evaluation method from the perspective of random consistency was verified to have a low classification bias.The evaluation criterion that alleviates random consistency can reduce classification bias and cluster number bias.In this paper,the random consistency of the Gini index is alleviated based on the standard framework to offset its multivalue bias.It is verified by artificial data sets that the standard Gini index can alleviate the multivalue bias problem of the Gini index and select the attributes with decision information.Experimental results on twelve benchmark datasets and two image data sets show that the decision tree based on the pure Gini index has higher generalization performance than the existing decision tree algorithms to mitigate multivalue bias.
作者 王婕婷 李飞江 李珏 钱宇华 梁吉业 Jieting WANG;Feijiang LI;Jue LI;Yuhua QIAN;Jiye LIANG(Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)
出处 《中国科学:信息科学》 CSCD 北大核心 2024年第1期159-190,共32页 Scientia Sinica(Informationis)
基金 科技创新2030—重大项目(批准号:2021ZD0112400) 国家自然科学基金重点项目(批准号:62136005) 国家自然科学基金青年基金(批准号:62106132,62306170) 山西省科技重大专项(批准号:202201020101006) 山西省基础研究计划(批准号:20210302124271,202103021223026)资助项目。
关键词 基尼指数 多值偏向 决策树 随机一致性 Gini index bias to multi-value decision tree random consistency
  • 相关文献

参考文献3

二级参考文献6

共引文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部