摘要
为解决随机森林RF(Random Forest)算法预测精度低的问题,提出一种综合改进随机森林算法CIRF(Comprehensive Improvement of Random Forest)。改进算法的核心思想是:在原始样本数据集中进行分区采样;根据因子分析方法计算各个属性在不同类别上的因子得分;根据因子得分对特征属性进行有权重的选择,构造特征子集;根据信息增益率(GainRatio)最大准则从特征子集种选择节点分裂属性,进行节点分裂;根据加权投票法输出最后的分类结果。改进算法可降低各决策树之间的相关性,提高算法的抗噪能力。将其在Iris、Blood、Yeast等数据集上进行仿真实验,结果表明CIRF算法较原始RF算法有更好的精确度。将其用于预警中国财政风险,结果显示该算法在财政风险中的预警性能很好。
Aiming at the problem of low prediction accuracy of random forest(RF) algorithm, this paper proposed a comprehensive improvement of random forest(CIRF) algorithm. Area sampling was done in the original sample data set. According to the factor analysis method, we calculated the factor scores of each attribute in different categories. The characteristic attributes were selected by weight in accordance with the factor score and the feature subset was constructed. The node splitting attribute was chosen from the feature subset on the basis of the maximum criterion of information gain ratio and then the node splitting was conducted. The final classification results were output by the weighted voting method. The improved algorithm could reduce the correlation between the decision trees and improve the anti-noise ability of the algorithm. Simulation experiments on data sets such as Iris, Blood and Yeast show that the CIRF algorithm is more accurate than the original RF algorithm. We apply it to early warning of China’s financial risk and the results show that the algorithm has good warning performance in financial risks.
作者
刘新雯
Liu Xinwen(School of Mathematical,Lanzhou Jiaotong University,Lanzhou 730070,Gansu,China)
出处
《计算机应用与软件》
北大核心
2018年第9期73-78,84,共7页
Computer Applications and Software