摘要
随机森林是一种著名的集成学习方法,被广泛应用于数据分类和非参数回归。本文对随机森林算法的主要理论进行阐述,包括随机森林收敛定理、泛化误差界以和袋外估计三个部分。最后介绍一种属性加权子空间抽样的随机森林改进算法,用于解决超高维数据的分类问题。
Random Forests is an important ensemble learning method and it is widely used in data classification and nonparametric regression. In this paper, we review three main theoretical issues of random forests, i.e., the convergence theorem, the generalization error bound and the out-of-bag estimation. In the end, we present an improved Random Forests algorithm, which uses a feature weighting sampling method to sample a subset of features at each node in growing trees. The new method is suitable to solve classification problems of very high dimensional data.
出处
《集成技术》
2013年第1期1-7,共7页
Journal of Integration Technology
关键词
随机森林
数据挖掘
机器学习
random forests
data mining
machine learning