期刊文献+

一种基于Spark的改进随机森林算法 被引量:3

AN IMPROVED RANDOM FOREST ALGORITHM BASED ON SPARK
下载PDF
导出
摘要 提出一种改进随机森林算法(SP-RF)。通过建立数据抽样索引表和随机特征索引表来实现随机森林算法在Spark上的并行化;通过计算随机森林算法中每个决策树的AUC值来给分类能力不同的决策树分配权重;提高随机森林算法在投票环节的分类精度。实验结果表明改进后的随机森林算法分类精度平均提高5%,运行时间平均减少25%以上。 This paper proposes an improved random forest algorithm(SP-RF).Parallelization of random forest algorithms on Spark was realized by establishing data sampling index table and random feature index tables;it distributed weights to decision trees with different classification abilities by calculating the AUC value of each decision tree in the random forest algorithm;it also improved the classification accuracy of random forest algorithm in the voting process.The experimental results show that the improved random forest algorithm has an average accuracy of 5%higher in classification and an average reduction of more than 25%in running time.
作者 段文杰 童孟军 Duan Wenjie;Tong Mengjun(School of Information Engineering,Zhejiang A&F University,Hangzhou 311300,Zhejiang,China;Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and Information,Hangzhou 311300,Zhejiang,China)
出处 《计算机应用与软件》 北大核心 2021年第8期275-279,共5页 Computer Applications and Software
基金 国家自然科学基金项目(31570629) 浙江省自然科学基金项目(LY16F020036)。
关键词 随机森林 SPARK AUC 并行化 大数据 Random forest Spark AUC Parallelization Big data
  • 相关文献

参考文献9

二级参考文献39

  • 1董师师,黄哲学.随机森林理论浅析[J].集成技术,2013,2(1):1-7. 被引量:145
  • 2叶强,张洁.基于遗传算法的多分类器融合模型在信用评估中的应用[J].哈尔滨工业大学学报,2006,38(9):1504-1505. 被引量:7
  • 3熊云波,李荣陆,胡运发.基于混淆矩阵的层次结构构造方法比较[J].模式识别与人工智能,2007,20(2):205-210. 被引量:6
  • 4Breiman L. Random forests[J]. Machine Learning, 2001,45, 5-32.
  • 5Kulkarni V Y, Sinha P K. Efficient learning of random forest classifier using disjoint partitioning approach[C]//Proc of the World Congress on Engineering, 2013 : 3-5.
  • 6Kulkarni V Y, Sinha P K. Random forest classifiers.. A sur- vey and future research directions[J]. International Journal of Advanced Computing,2011,36(1) :1144-1153.
  • 7Buja A, Stuetzle W. Bagging does not always decrease mean squared error[R]. NJ ATT Labs-Research, 2012.
  • 8Oshiro T M, Perez P S, Baranauskas J A. How many trees in a random forest[C]//Proc of MLDM'12, 2012:154-168.
  • 9IA Yu,Zhang Chun-xia. Based on a random sample out_of_ bag over forest parameter estimation algorithm[J]. Journal of Systems Engineering,2011,26(4) :566-572. (in Chinese).
  • 10Yong Kai. Research on feature selection and model optimiza- tion of random forest[D]. Harbin: HIT, 2008. (in Chinese).

共引文献66

同被引文献30

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部