基于正则化互信息和差异度的集成特征选择被引量：3

Ensemble Feature Selection Based on Normalized Mutual Information and Diversity

下载PDF

导出

摘要如何构造差异性大的基分类器是集成学习研究的重点,为此提出迭代循环选择法:以最大化正则互信息为准则提取最优特征子集,进而基于此训练得到基分类器;同时以错分样本个数作为差异性度量准则来评价所得基分类器的性能,若满足条件则停止,反之则循环迭代直至结束。最后用加权投票法融合所选基分类器的识别结果。通过仿真实验验证算法的有效性,以支持向量机为分类器,在公共数据集UCI上进行实验,并与单SVM及经典的Bagging集成算法和特征Bagging集成算法进行对比。实验结果显示,该方法可获得较高的分类精度。 How to generate classifiers with higher diversity is an important problem in ensemble learning, consequently, an iterative algorithm was proposed as follows：base classifier is trained using optimal feature subset which is selected by maximum normalized mutual information, simultaneously, the attained base classifier is measured by the diversity based on the number of miss classified samples. The algorithm stops if satisfy, otherwise iterates until end. Finally, weighted voting method is utilized to fusion the base classifiers recognition results. To attest the validity, we made ex- periments on UCI data sets with support vector machine as the classifier, and compared it with Single-SVM, Bagging- SVM and AB-SVM. Experimental results suggest that our algorithm can get higher classification accuracy.

作者姚旭王晓丹张玉玺薛爱军

机构地区空军工程大学防空反导学院西安

出处《计算机科学》 CSCD 北大核心 2013年第6期225-228,共4页 Computer Science

基金国家自然科学基金项目(60975026,61273275)资助

关键词集成学习集成特征选择互信息差异性 Ensemble learning, Ensemble feature selection, Mutual Information, Diversity

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献19

1Opitz D.Feature selection for Ensembles[C]// Proceedings of American Association for Artificial Intelligence.1999:379-384.
2Ho TK.The random subspace method for constructing derision forests[J].IEEE Transaction on Pattern Analysis and Machine Intelligence,1998,20(8):832-844.
3Brylla R,Osunab R G,Queka F.Attribute Bagging:Improving accuracy of classifier ensembles by using random feature subsets[J].Pattern Recognition,2003,36 (6):1291-1302.
4Oliveira L S,Morita M,Sabourin R.Multi-Objective Genetic Al-gorithm Create Ensemble of Classifiers[C]// Pros OFEMO 2005.Guanajuato,Mexico,2005:592-606.
5李霞,王连喜,蒋盛益.面向不平衡问题的集成特征选择[J].山东大学学报（工学版）,2011,41(3):7-11. 被引量：5
6孙亮,韩崇昭,沈建京,戴宁.集成特征选择的广义粗集方法与多分类器融合[J].自动化学报,2008,34(3):298-304. 被引量：10
7张宏达,王晓丹,韩钧,徐海龙.分类器集成差异性研究[J].系统工程与电子技术,2009,31(12):3007-3012. 被引量：9
8Dietterich T G.Ensemble methods in machine learning[C]//Proc.The 1st Int ' 1 Workshop on Multiple Classifier Systems (MCS 2000).Italy,LNCS,Springer,2000:1-15.
9Kuncheva L I,Skurichina M,Duin R P W.An experimental study on diversity for bagging and boosting with linear classifiers[J].Information Fusion,2002,3:245-258.
10Dietterich T G.An experimental comparison of three methods for constructing ensembles of decision trees:bagging,boosting,and randomization[J].Machine Learning,2000,40:139-158.

二级参考文献49

1李凯,黄厚宽.一种提高神经网络集成差异性的学习方法[J].电子学报,2005,33(8):1387-1390. 被引量：9
2Sun Liang,Han Chongzhao.Dynamic weighted voting for multiple classifier fusion:a generalized rough set method[J].Journal of Systems Engineering and Electronics,2006,17(3):487-494. 被引量：9
3肖迪,胡寿松.实域粗糙集理论及属性约简[J].自动化学报,2007,33(3):253-258. 被引量：32
4Hansen L K, Salamon P. Neural network ensembles[J]. IEEE Trans. on Pattern AnaLysis and Machine Intelligence, 1990, 12(10) :993 - 1001.
5Dietterich T G. Machine learning research: four current directions[J]. AI Magazine, 1997,18(4) : 97 - 136.
6Dietterich T G. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization[J]. Machine Learning, 2000,40(2) : 139 - 158.
7Kohavi R, Wolpert D H. Bias plus variance decomposition for zero-one loss functions[C]//Saitta L. Machine learning: Proc. of the 13th International Conference, Morgan Kau f mann , Los Altos, CA, 1996:275 - 283.
8Shipp C A, Kuncheva L I. Relationships between combination methods and measures of diversity in combining classifiers[J].Information Fusion ,2002,3(2) : 135 - 148.
9Chandra A, Yao X. DIVACE: diverse and accurate ensemble learning algorithm[M]. Yang Z R, et al. IDEAL, LNCS 3177, Heidelberg: Springer, 2004 : 619 - 625.
10Melville P, Mooney R J. Creating diversity in ensembles using artificial data[J].Information Fusion, 2005,6(1) : 99 - 111.

共引文献41

1别致,周俊生,陈家骏.基于SVM-Adaboost的中文组块分析[J].计算机工程与应用,2008,44(21):171-173. 被引量：1
2张晓龙,任芳.支持向量机与AdaBoost的结合算法研究[J].计算机应用研究,2009,26(1):77-78. 被引量：20
3张振宇.稳健的多支持向量机自适应提升算法[J].大连交通大学学报,2010,31(2):98-100. 被引量：1
4邱政权,范小春,王俊年.基于维纳滤波和子带处理的说话人识别[J].声学与电子工程,2010(2):41-43.
5杨人子,严洪森.基于信息粒度的知识网的模糊分类与检索方法[J].自动化学报,2011,37(5):585-595. 被引量：8
6林妍,李蕾红.民营上市公司信用风险评估实证研究[J].财会通讯（下）,2011(7):82-84. 被引量：1
7梁军,陈龙,汪若尘,胥正川,胥杜杰.MFA-DMFS:一种新的多分类器融合方法及其应用研究[J].计算机应用研究,2012,29(2):522-526. 被引量：1
8杨振军,苏忠亭,王伟,王琳.基于AdaBoost-SVM算法的某火炮炮闩技术状态评估[J].装甲兵工程学院学报,2012,26(2):54-57. 被引量：3
9孔英会,景美丽.基于混淆矩阵和集成学习的分类方法研究[J].计算机工程与科学,2012,34(6):111-117. 被引量：41
10李凤英,李宏,李培.针对弱标记的多标记数据集成学习分类方法[J].微型机与应用,2012,31(13):73-75.

同被引文献30

1Opitz D W. Feature selection for ensembles[C]//Proceedings of 16th National Conference on Artificial Intelligence (AAAI-99). Orlando, FL, USA, 1999 : 379-384.
2Ho T K. The random subspace method for constructing decision forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(8) :832-844.
3Breiman L. Random forests[J] Machine learning, 2001,45 ( 1 ) : 5-32.
4De Bock K W,Coussement K,Van den Poel D. Ensemble classi- fication based on generalized additive models[J]. Computational Statistics :s. Data Analysis,2010,54(6):1535-1546.
5Moon H, Ahn H, Kodell RL, et al. Ensemble methods for clast fication of patients for personalized medicine with high-dime sional data[J]. Artificial Intelligence in Medicine, 2007,41 (3) : 197-207.
6Liu Hua-wen, Liu Lei, Zhang Hui-jie. Ensemble gene selection by grouping for microarray data classification[J]. Journal of Bio- medical Informatics, 2010,43(1) : 81-87.
7Wald R, Khoshgoftaar T M, Dittman D. Mean aggregation ver- sus robust rank aggregation for ensemble gene selection[C]// 2012 llth International Conference on Machine Learning and Applications (ICMLA). Boca Raton, FL, USA, 2012 : 63-69.
8Lin Song, I.angfelder P, Horvath S. Comparison of co-expression measures : mutual information, correlation, and model based indi- ces[J]. BMC Bioinformatics,2012,13(1):328.
9Frey B J,Dueck D. Clustering by passing messages between data points[J]. Science,2007,315(5814) :972-976.
10Boulesteix A L, Slawski M. Stability and aggregation of ranked gene lists[J]. Briefings in Bioinformaties, 2009,10(5) : 556-568.

引证文献3

1孟军,尉双云.基于近邻传播聚类的集成特征选择方法[J].计算机科学,2015,42(3):241-244. 被引量：5
2王成,郭飞,郑黎晓,赖雄鸣.改进D-S证据理论的多分类器决策层融合系统[J].小型微型计算机系统,2015,36(5):1138-1141. 被引量：1
3李巧,周双娥,杨晶.模型融合在用户续购行为分析中的应用[J].小型微型计算机系统,2017,38(10):2231-2235. 被引量：2

二级引证文献8

1李志超,孔国利.近邻传播聚类算法的RBF隐含层节点优化[J].现代电子技术,2016,39(19):16-19. 被引量：1
2温海标.基于PSO与GA的SVM特征选择与参数优化算法[J].软件导刊,2017,16(5):21-23. 被引量：2
3储岳中,刘恒,张学锋,潘祥.基于选择性聚类集成的图像目标分类方法[J].微电子学与计算机,2017,34(11):58-62.
4张素智,杨芮,赵亚楠.食品安全大数据的融合及分类并行处理技术研究[J].湖北民族学院学报（自然科学版）,2018,36(3):256-265. 被引量：9
5王一宾,田文泉,程玉胜.基于标记分布学习的异态集成学习算法[J].模式识别与人工智能,2019,32(10):945-954. 被引量：4
6李帅标,赵海燕,陈庆奎,曹健.基于Stacking策略的过程剩余执行时间预测[J].小型微型计算机系统,2019,40(12):2481-2486. 被引量：5
7顾东虎.Hadoop云平台下基于P-WAP的大数据聚类挖掘算法[J].长春师范大学学报,2020,39(10):29-35. 被引量：3
8韩绍禹,徐鹏程,蒋迪遥,潘超,李润宇.基于优化近邻传播聚类的CMN风速预测[J].电网与清洁能源,2022,38(8):110-120.

1季薇,李云.基于局部能量的集成特征选择[J].南京大学学报（自然科学版）,2012,48(4):499-503. 被引量：2
2鲍捷,杨明,刘会东.高维数据的1-范数支持向量机集成特征选择[J].计算机科学与探索,2012,6(10):948-953. 被引量：4
3李霞,王连喜,蒋盛益.面向不平衡问题的集成特征选择[J].山东大学学报（工学版）,2011,41(3):7-11. 被引量：5
4马超,陈西宏,徐宇亮,王光明.广义邻域粗集下的集成特征选择及其选择性集成算法[J].西安交通大学学报,2011,45(6):34-39. 被引量：6
5孙亮,韩崇昭,沈建京,戴宁.集成特征选择的广义粗集方法与多分类器融合[J].自动化学报,2008,34(3):298-304. 被引量：10
6季金胜,郭艺友,霍宏,方涛.考虑稳定性要求的特征选择方法[J].高技术通讯,2014,24(11):1203-1209.
7孙建文,刘三(女牙),杨宗凯,王佩.采用集成特征选择的网络书写纹识别研究[J].小型微型计算机系统,2012,33(5):1108-1112.
8姚旭,王晓丹,张玉玺,权文.基于Markov blanket和互信息的集成特征选择算法[J].系统工程与电子技术,2012,34(5):1046-1050. 被引量：7
9何鸣,李国正,袁捷,吴耿锋.基于主成份分析的Bagging集成学习方法[J].上海大学学报（自然科学版）,2006,12(4):415-418. 被引量：8
10孟军,尉双云.基于近邻传播聚类的集成特征选择方法[J].计算机科学,2015,42(3):241-244. 被引量：5

计算机科学

2013年第6期

浏览历史

内容加载中请稍等...

基于正则化互信息和差异度的集成特征选择被引量：3

参考文献19

二级参考文献49

共引文献41

同被引文献30

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

基于正则化互信息和差异度的集成特征选择 被引量：3

参考文献19

二级参考文献49

共引文献41

同被引文献30

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

基于正则化互信息和差异度的集成特征选择被引量：3