期刊文献+

高维数据下基于云平台的随机森林算法的研究与实现 被引量:3

The Research and Implementation of Cloud Platform Based Random Forest Algorithm on High Dimensional Data
下载PDF
导出
摘要 随机森林算法在数据挖掘领域中得到了广泛的应用,该算法通过构建多个不同的决策树可以获得更高的分类结果。但是,随着数据规模的增大,人们开始接触到各大规模的数据以及更高维度的数据属性。传统的随机森林构建算法不能有效、快速地处理海量高维数据,严重影响了数据的分类效率,从而影响预测效率。本文针对高维、海量数据下随机森林构建算法,改进并提高了该算法的效率,提出了基于云计算平台的随机森林构建算法。该算法可以快速的完成数据分类预测,并通过实验结果进一步展示了该算法的效率以及可扩展性。 Random forest algorithm is popularly used in data mining area, and this algorithm could get better classification results through building multiple different decision trees. However, with the incensement of data scale, people begin to get in touch with big scale data and high dimensional data. Traditional random forest algorithm could deal with high dimension-al large scale data effectively and fast, and finally inflect the data classification efficiency and predication. In this paper, fo-cusing on high dimensional, large scale data, we improve the random forest algorithm, and propose cloud platform based random forest building algorithm. This algorithm could complete the classification predication fast, and the experimental re-sults further show that the algorithm has good efficiency and scalability.
作者 许旻
机构地区 苏州市职业大学
出处 《科技通报》 北大核心 2014年第6期222-224,共3页 Bulletin of Science and Technology
基金 江苏省现代教育技术研究2012年度课题(2012-R-21903)
关键词 高维数据 海量数据 云平台 随机森林 决策树 High dimensional data huge scale data cloud platform,random forest decision tree
  • 相关文献

参考文献7

  • 1王翀,王卫红,陈健.KDD技术及数据库营销在商业中的应用[J].科技通报,2003,19(1):67-71. 被引量:5
  • 2Liaw,Andy,and Matthew Wiener.Classification and Regression by randomForest[J].R news 2.3(2002):18-22.
  • 3Safavian,S.Rasoul,and David Landgrebe.A survey of decision tree classifier methodology[J].Systems,Man and Cybernetics,IEEE Transactions on 21.3(1991):660-674.
  • 4张华伟,王明文,甘丽新.基于随机森林的文本分类模型研究[J].山东大学学报(理学版),2006,41(3):5-9. 被引量:58
  • 5Chu Cheng,et al.Map-reduce for machine learning on multicore[C]//.Advances in neural information processing systems,2007,19:281.
  • 6Boulesteix,Anne-Laure,et al.Overview of Random Forest Methodology and Practical Guidance[M].Emphasis on Computational Biology and Bioinformatics.2012.
  • 7Oshiro,Thais,Pedro Perez,and JoséBaranauskas.How many trees in a random forest?[J].Machine Learning and Data Mining in Pattern Recognition,2012,154-168.

二级参考文献14

  • 1TomMMitchell.机器学习[M].北京:机械工业出版社,2003..
  • 2T Joachims. Text categorization with support vector machines:learning with many relevant features[A]. In: The 10th European Conference on Machine Learning[C]. New York: Springer, 1998. 137-142.
  • 3E Wiener. A neural network approach to topic spotting[A].The 4th Annual Symposium on Document Analysis and Information Retrieval[C]. Las Vegas: ACM Press,1995. 317-332.
  • 4Quanlan J R. C4.5: Programs for machine learning[M]. San Francisco: Morgan Kaufmann, 1993.
  • 5Breiman L. Random forests[J]. Machine Learning, 2001,45(1):5-32.
  • 6Breiman L. Manual on setting up, using, and understanding random forests v4.0 [EB/OL]. http://oz. Berkeley. edu/users/breiman/Using-random-forests-V4.0. pdf.
  • 7Remlinger K. Introduction and application of random forest on high though put screening data from drug discovery[EB/OL].http://www4.ncsu. edu/ksremlin.
  • 8Duda R Hart PE,Stock DG.模式分类(第二版)[M].李宏东,姚天翔泽.北京:机械工业出版社,2003.
  • 9Marques J Pdesa.模式识别—原理、方法及应用[M].吴逸飞译.北京:清华大学出版社,2002.
  • 10L K Hansen, P Salamon. Neural network ensembles[J]. Pattern Analysis and Machine Intelligence, 1990, 12(10) :993-1001.

共引文献61

同被引文献42

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部