摘要
随机森林算法在数据挖掘领域中得到了广泛的应用,该算法通过构建多个不同的决策树可以获得更高的分类结果。但是,随着数据规模的增大,人们开始接触到各大规模的数据以及更高维度的数据属性。传统的随机森林构建算法不能有效、快速地处理海量高维数据,严重影响了数据的分类效率,从而影响预测效率。本文针对高维、海量数据下随机森林构建算法,改进并提高了该算法的效率,提出了基于云计算平台的随机森林构建算法。该算法可以快速的完成数据分类预测,并通过实验结果进一步展示了该算法的效率以及可扩展性。
Random forest algorithm is popularly used in data mining area, and this algorithm could get better classification results through building multiple different decision trees. However, with the incensement of data scale, people begin to get in touch with big scale data and high dimensional data. Traditional random forest algorithm could deal with high dimension-al large scale data effectively and fast, and finally inflect the data classification efficiency and predication. In this paper, fo-cusing on high dimensional, large scale data, we improve the random forest algorithm, and propose cloud platform based random forest building algorithm. This algorithm could complete the classification predication fast, and the experimental re-sults further show that the algorithm has good efficiency and scalability.
出处
《科技通报》
北大核心
2014年第6期222-224,共3页
Bulletin of Science and Technology
基金
江苏省现代教育技术研究2012年度课题(2012-R-21903)
关键词
高维数据
海量数据
云平台
随机森林
决策树
High dimensional data
huge scale data
cloud platform,random forest
decision tree