摘要
随机森林分类算法在产生决策树以及投票流程中各个决策树的分类准确度各不相同,由此带来的问题是少部分决策树会影响随机森林算法的整体分类性能。除此以外,数据集中的不平衡数据也能影响到决策树的分类精度。针对以上缺点,对Bootstrap抽样方法添加约束条件,以降低非平衡数据对生成决策树的影响;以及利用袋外数据(Outof-Bagging)和非平衡系数对生成的决策树进行评估加权。试验结果表明,所提算法改善了随机森林对不平衡数据的分类精度。
The classification accuracy of the random forest classification algorithm is different in the decision tree generation and voting process.The problem is that a small number of decision trees will affect the overall classification performance of the random forest algorithm.In addition,the unbalanced data in the dataset can also affect the classification accuracy of the decision tree.In view of the above shortcomings,add constraints to the Bootstrap sampling method to reduce the impact of unbalanced data on the generation of decision trees;And use out of bag data(Out of Bagging)and unbalanced coefficients to evaluate and weight the generated decision tree.The experimental results show that the proposed algorithm improves the classification accuracy of random forests for unbalanced data.
作者
刘成
王佳斌
洪继炜
Liu Cheng;Wang Jiabin;Hong Jiwei(College of Engineering,Huaqiao university,Quanzhou 362021,China)
出处
《现代计算机》
2023年第14期66-69,共4页
Modern Computer