期刊文献+

基于特征选择的高维数据集成学习方法研究 被引量:3

Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
下载PDF
导出
摘要 从集成学习的预测误差分析和偏差-方差分解可以发现使用有限的、具有正确率和差异性的基学习器进行集成学习,具有更好的泛化精度。利用信息熵构建了两阶段的特征选择集成学习方法,第一阶段先按照相对分类信息熵构建精度高于0.5的基特征集B;第二阶段先在B的基础上按互信息熵标准评判独立性,运用贪心算法构建独立的特征子集,再运用Jaccard系数评价特征子集间多样性,选取多样性的独立特征子集并构建基学习器。通过数据实验分析发现,该优化方法的执行效率和测试精度优于普通Bagging方法,在多分类的高维数据集上优化效果更好,但不适用于二分类问题。 From the prediction error analysis and deviation-variance decomposition of ensemble learning,it can be found that the use of limited,accurate and differentiated basic learners for ensemble learning has better generalization accuracy.A two-stage feature selection ensemble learning method is constructed by using information entropy.In the first stage,the basic feature set B with accuracy higher than 0.5 is constructed according to the relative classification information entropy.In the second stage,independent feature subset is constructed by greedy algorithm and mutual information entropy criterion on the basis of B.Then Jaccard coefficient is used to evaluate the diversity among feature subsets,and the independent feature subset of diversity is selected and the basic learner is constructed.Through the analysis of data experiments,it is found that the efficiency and accuracy of the optimization method are better than the general Bagging method,especially in multi-classification high-dimensional datasets,the optimization effect is good,but it is not suitable for the two-classification problem.
作者 周钢 郭福亮 ZHOU Gang;GUO Fu-liang(Naval University of Engineering,Wuhan 430033,China)
机构地区 海军工程大学
出处 《计算机科学》 CSCD 北大核心 2021年第S01期250-254,共5页 Computer Science
关键词 集成学习 多样性 特征选择 信息熵 高维数据 Ensemble learning Diversity Feature selection Information entropy High-dimensional data
  • 相关文献

参考文献12

二级参考文献166

共引文献479

同被引文献7

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部