期刊文献+

改进的并行随机森林算法及其包外估计 被引量:4

Improved parallel random forest and its out_of_bag estimator
下载PDF
导出
摘要 传统的包外估计记录全局数据与树之间的对应关系来测算泛化误差。然而基于MapReduce机制的并行随机森林算法(MR_RF)是建立在多个互不可见的分块数据上。对此分析MR_RF与RF的区别,设计了一个新的适用于MR_RF的包外泛化误差估计方法。主要将测算限定在数据块内,最终森林的泛化误差估计取块结果的平均。实验结果表明,新的包外估计方法与交叉验证在默认分块上的结果近似,却随着分块的增加出现偏差,对此分析了可能的原因,并给出选择集成方案思想,且分块大小与分类准确率成反比,与分类速率成正比。 Traditional out_of _ bag( OOB) estimator needs to record the relationship of global data and the trees so as to estimate generation error. However parallel random forest based on MapReduce algorithm( MR _RF) is built on blocks that independently with each other. This paper analyzed the difference between the MR_RF and random forest,and designed a new oob estimator that was applicable to estimate MR_RF's generalization error. Its key idea was putting the OOB calculator just into that particular block and using the average result of all blocks as the final OOB estimator result. Experiments show that in the case of the default partition,the new method is as effective as cross validation. However it shows deviation as the blocks increase. This paper analyzed the reason and gave the idea of selective ensemble scheme. Meanwhile,the block size is proportional to the classification rate but inversely proportional to the classification accuracy. When dealing with large data classification problems,it is necessary to adjust the block size to take the compromise between accuracy and rate.
作者 钱雪忠 秦静 宋威 Qian Xuezhong;Qin Jing;Song Wei(Engineering Research Center of lnternet of Things Technology Applications for Ministry of Education,Jiangnan University,Wuxi Jiangsu 214122,China)
出处 《计算机应用研究》 CSCD 北大核心 2018年第6期1651-1654,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(61673193) 中央高校基础研究资助项目(JUSRP51510 JUSRP51635B)
关键词 MAPREDUCE 随机森林 包外估计 泛化误差 交叉验证 MapRcducc random forest(RF) out_of_bag estimator generalization error cross validation
  • 相关文献

参考文献2

二级参考文献23

  • 1BREIMAN L. Bagging predictors [J]. Machine Learning, 1996,24(2): 123-140.
  • 2BREIMAN L. Out-of-bag estimation, CA 94708 [R]. Technical Report, Department of Statistics, University of California, Berkeley, 1996.
  • 3WOLPERT DH, MACREADY WG. An efficient method to estimate bagging's generalization error [J]. Machine Learning, 1999,35(1): 41-51.
  • 4BYLANDER T. Estimating generalization error on two- class datasets using out-of-bag estimates [J]. Machine Learning, 2002,48(1-3):287-297.
  • 5HERNANDEZ-LOBATO D, MARTINEZ-MUNOZ G, SUAREZ A. Out of bootstrap estimation of generalization curves in bagging ensembles [C]. In: Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning, Birmingham, Lecture Notes in Computer Science,2007,4881 : 47-56. [18].
  • 6BUHLMANN P, YU B. Analyzing bagging [J]. Annals of Statistics, 2002,30(4): 927-961.
  • 7BUJA A, STUETZLE W. Observations on bagging [J]. Statistica Sinica,2006,16(2): 323-351.
  • 8FRIEDMAN JH, HALL P. On bagging and nonlinear estimation [J]. Journal of Statistical Inference and Planning, 2007,137(3): 669-683.
  • 9MARTINEZ-MUNOZ G, SUAREZ A. Out-of-bag estimation of the optimal sample size in bagging [J]. Pattern Recognition,2010,43(1):143-152.
  • 10Hothorn T, Lausen B. Double-bagging: combining classifiers by bootstrap aggregation [J]. Pattern Recognition, 2003,36(6): 1303-1309.

共引文献18

同被引文献28

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部