期刊文献+

高维数据情形下的一种基于随机投影的集成分类方法 被引量:4

A new random projection-based ensemble classifier for high-dimensional data
下载PDF
导出
摘要 针对高维数据的分类问题,提出一种基于随机投影的决策树集成学习方法(Projection Forest,简记PJForest).该方法以决策树为基分类器,利用一系列随机投影对数据进行降维,基于降维后的数据构建相应的一系列决策树,而后通过集成学习构造集成分类器.利用适当的随机投影对数据进行降维,能保持数据几何结构的信息;且通过随机投影对原始数据进行扰动,能丰富决策树的多样性,经过适当集成可有效克服噪音的影响,进而提升PJForest的泛化能力.证明了PJForest泛化误差的极限性质,得到泛化误差在一定意义下的收敛速度.还开展大量的模拟研究,并对实际数据进行了实证分析.模拟研究的结果表明,PJForest能有效地对包含大量噪音的高维数据进行分类,与已有的诸如随机森林、Xgboost这些方法相比,有更好的分类性能. A decision tree ensemble method based on random projection(projection forest,PJForest)was proposed to solve the classification problem of high-dimensional data.This method used the decision tree as the base classifier and reduced the dimensionality of the data by using a series of random projections.Then based on dimensionally reduced data,a series of decision trees were constructed,and then the ensemble classifier was constructed through ensemble learning.Using appropriate random projection to reduce the dimensionality of the data can preserve the information contained in the geometric structure of the data.Moreover,perturbation of raw data through random projection can enrich the diversity of decision trees.After proper ensemble learning,it can effectively overcome the influence of noise and improve the generalization ability of PJForest.The limiting property of PJForest generalization error was proved and the convergence rate of generalization error under certain conditions was obtained.Many simulation studies were conducted and empirical studies on real life data were empirically analyzed.The simulation results showed that the method of PJForest can effectively classify high dimensional data with a large amount of noises,and has better properties than current classification methods such as random forest,Xgboost.
作者 崔文泉 黄禹侨 CUI Wenquan;HUANG Yuqiao(Department of Statistics and Finance, Shool of Management, University of Science and of Technology of China, Hefei 230026, China)
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2019年第12期974-984,共11页 JUSTC
基金 国家自然科学基金(71873128) 安徽省自然科学基金(1308085MA02)资助.
关键词 决策树 多样性 高维 分类 集成学习 随机投影 decision tree diversity high-dimensional classification ensemble learning random projection
  • 相关文献

参考文献1

共引文献63

同被引文献93

引证文献4

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部