期刊文献+

Hadoop环境下基于敏感度的随机森林分类算法研究 被引量:2

Research on random forest classification algorithm based on sensitivity degree in Hadoop environment
下载PDF
导出
摘要 针对当前大数据环境下随机森林分类算法在处理不平衡数据集分类任务时存在的小类样本被忽视及效率低的问题,提出了一种Hadoop环境下基于敏感度的随机森林分类算法.该算法引入了文本分类特征选择技术中的相关方法,采用MapReduce编程模型,在Hadoop云计算平台上实现了算法的并行化.通过实验对比分析了该算法与传统随机森林分类算法对不平衡数据的分类效果.结果表明,该算法显著提高了传统随机森林分类算法的性能,且具有高效性和易扩展性. When applied to deal with the imbalanced dataset classification task under the circumstance of big data,Random Forest classification algorithm always suffers from the neglect of minority class and inefficiency problem. A Random Forest classification algorithm based on Sensitivity Degree in Hadoop environment is proposed to solve the above-mentioned problems,which introduced the method from feature selection of text classification,and is parallelized by using MapReduce programming model in Hadoop cloud computing environment. Comparison was made through experiments in regard to the effect of the imbalanced dataset classification by this algorithm and by the traditional Random Forest classification algorithm. The experimental results show that this algorithm significantly improves the performance of the traditional Random Forest classification algorithm,and has high efficiency and ease of scalability.
出处 《内蒙古科技大学学报》 CAS 2016年第3期297-301,共5页 Journal of Inner Mongolia University of Science and Technology
基金 国家自然科学基金资助项目(71363040)
关键词 分类 云计算 MAPREDUCE 随机森林 特征选择 classification cloud computing MapReduce Random Forest feature selection
  • 相关文献

参考文献16

  • 1He J, Zhang Y, Li X, et al. Learning naive Bayes classi- fiers from positive and unlabelled examples with uncer- tainty [ J ]. International Journal of Systems Science, 2012, 43(10): 1805-1825.
  • 2Bijalwan V, Kumar V, Kumari P, et ah KNN based ma- chine learning approach for text and document mining [ J ]. International Journal of Database Theory and Appli- cation, 2014, 7(1): 61-70.
  • 3Ciresan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification [ A ]//Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on Computer Vision and Pattern Recognition [ C]. IEEE Press, 2012: 3642-3649.
  • 4Rodriguez-Galiano V F, Ghimire B, Rogan J, et al. An assessment of the effectiveness of a random forest classifi- er for land-cover classification[J]. ISPRS Journal of Pho- togrammetry and Remote Sensing, 2012, (67) :93-104.
  • 5Gray K R, Aljabar P, Heckemann R A, et al. Random forest-based similarity measures for multi-modal classifi- cation of Alzheimer's disease [ J ]. Neuroimage, 2013, ( 65 ) : 167-175.
  • 6Lin W Z, Fang J A, Xiao X, et al. iDNA - Prot: identi- fication of DNA binding proteins using random forest with grey model[J]. PLoS One, 2011, 6(9) : 1-7.
  • 7Idris A, Rizwan M, Khan A. Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies [ J ]. Computers & Electrical Engineering, 2012, 38 ( 6 ) : 1808-1819.
  • 8Dobre C, Xhafa F. Parallel Programming Paradigms and Frameworks in Big Data Era[ J]. International Journal of Parallel Programming, 2014, 42(5) :710-738.
  • 9Apache Hadoop. Hadoop [ EB/OL ]. http://wiki. aquche, org/hadoop/prontpage Map Reduce. html, 2015,05-04.
  • 10Lopez V, Fernandez A, Garcia S, et al. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics [ J ]. Information Sciences, 2013,250 : 113-141.

二级参考文献31

  • 1Vapnik V N. The nature of statistical learning theory[M] .New York: Springer,2000: 138-167.
  • 2He H B, Edwardo A. Learning from imbalanced data[J] .IEEE Trans on Knowledge and Data Engineering, 2009,21(8): 1263-1284.
  • 3Liu X Y, Zhou Z H. Exploratory under-sampling for class-imbalance learing[J] . IEEE Trans on Systems, Man andCybernetics, 2009, 39(2): 539-550.
  • 4Liu X Y, Zhou Z H. Training cost-sensitive neural networkswith methods addressing the class imbalance problem[J] .IEEE Trans on Knowledage and Data Engineering,2006,18(1): 63-77.
  • 5Van H J, Khoshgoftaar T M,Napolitano A. Experimentalperspectives on learning from imbalanceed data[C] . Proc ofthe 24th Int Conf on Machine Learning. New York: ACM,2007: 143-146.
  • 6Weiss G M. Mining with rarity: A unifying framework[J] .ACM SIGKDD Explorations Newsletter,2004,6(1): 7-19.
  • 7Estabrooks A, Jo T. A multiple resampling method forlearning from imbalanced data sets[J] . ComputationalIntelligence, 2004, 20(11): 18-36.
  • 8Han H,Wang W Y, Mao B H. Borderline-SMOTE: A newover-sampling method in imbalanced data sets leaming[C] .Proc Int Conf of Intelligent Computing. Berlin Heidelberg:Springer, 2005: 878-887.
  • 9Akban I R, Kwek S, Japkow I. Applying support vectormachines to imbalanced datasets[C] . Proc of the 15thEuropean Conf on Machines Learning. Berlin Heidelberg:Springer, 2004: 39-50.
  • 10Bastista G E, Prati R C, Monard M C. A study ofthe Behavior of several methods for balancing machinelearning training data[J] . ACM SIGKDD ExplorationNewsletter, 2004’ 6(1): 20-29.

共引文献33

同被引文献13

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部