期刊文献+

面向不平衡问题的集成特征选择 被引量:5

Ensemble learning based feature selection for imbalanced problems
原文传递
导出
摘要 传统的特征选择方法基本上是以精度为优化目标,没有充分考虑数据样本类别分布倾斜性,在数据分布不平衡的数据集上性能表现不理想。在不平衡数据集上通过有放回的抽样方法独立地从数据集大类样本集中随机抽取多个样本子集,使每次随机抽取的样本数量与小类样本数量一致,然后将各抽取的样本子集分别与小类样本集组合成多个新的训练样本集。对多个新样本集的特征子集以集成学习的方式采用投票机制进行投票,数据集的最终特征子集以得票数目超过半数的特征共同组合而成。在UCI不平衡数据集上的实验结果显示,提出的方法表现出了较好的性能,是一种能够处理不平衡问题的有效特征选择方法。 The traditional feature selection methods are basically aimed for getting the optimal accuracy without full consideration of the data distribution,which can not achieve promising results on imbalanced datasets.A new feature selection method was proposed based on the data distribution modification for imbalanced data sets.This approach could modify data distribution many times by sampling with replacement.The instances of large classes were equal to the minor class samples in each new dataset.Finally,the final selected features were generated by voting mechanism for ensemble learning,which could combine the selected features by receiving more votes than half from all the new training datasets.Experimental results on several UCI datasets showed that the proposed method was an effective feature selection approach for imbalance problems.
出处 《山东大学学报(工学版)》 CAS 北大核心 2011年第3期7-11,22,共6页 Journal of Shandong University(Engineering Science)
基金 国家自然科学基金资助项目(61070061) 广东省自然科学基金资助项目(9151026005000002) 广东省高层次人才资助项目
关键词 不平衡数据集 特征选择 集成学习 抽样 imbalanced data feature selection ensemble learning sampling
  • 相关文献

参考文献18

二级参考文献83

共引文献143

同被引文献48

  • 1李霞,张田文,郭政.一种基于递归分类树的集成特征基因选择方法[J].计算机学报,2004,27(5):675-682. 被引量:26
  • 2李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330. 被引量:45
  • 3王晓丹,孙东延,郑春颖,张宏达,赵学军.一种基于AdaBoost的SVM分类器[J].空军工程大学学报(自然科学版),2006,7(6):54-57. 被引量:22
  • 4Opitz D.Feature selection for Ensembles[C]// Proceedings of American Association for Artificial Intelligence.1999:379-384.
  • 5Ho TK.The random subspace method for constructing derision forests[J].IEEE Transaction on Pattern Analysis and Machine Intelligence,1998,20(8):832-844.
  • 6Brylla R,Osunab R G,Queka F.Attribute Bagging:Improving accuracy of classifier ensembles by using random feature subsets[J].Pattern Recognition,2003,36 (6):1291-1302.
  • 7Oliveira L S,Morita M,Sabourin R.Multi-Objective Genetic Al-gorithm Create Ensemble of Classifiers[C]// Pros OFEMO 2005.Guanajuato,Mexico,2005:592-606.
  • 8Dietterich T G.Ensemble methods in machine learning[C]//Proc.The 1st Int ' 1 Workshop on Multiple Classifier Systems (MCS 2000).Italy,LNCS,Springer,2000:1-15.
  • 9Kuncheva L I,Skurichina M,Duin R P W.An experimental study on diversity for bagging and boosting with linear classifiers[J].Information Fusion,2002,3:245-258.
  • 10Dietterich T G.An experimental comparison of three methods for constructing ensembles of decision trees:bagging,boosting,and randomization[J].Machine Learning,2000,40:139-158.

引证文献5

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部