期刊文献+

特征选择方法中三种度量的比较研究 被引量:9

The Comparison of Three Measures in Feature Selection
下载PDF
导出
摘要 不同类型数据中特征与类别以及特征与特征之间存在一定的线性和非线性相关性。针对基于不同度量的特征选择方法在不同类型数据集上选取的特征存在明显差别的问题,本文选择线性相关系数、对称不确定性和互信息三种常用的线性或非线性度量,将它们应用于基于相关性的快速特征选择方法中,对它们在基因微阵列和图像数据上的特征选择效果进行实验验证和比较。实验结果表明,基于相关性的快速特征选择方法使用线性相关系数在基因数据集上选取的特征集往往具有较好分类准确率,使用互信息在图像数据集上选取的特征集的分类效果较好,使用对称不确定性在两种类型数据上选取特征的分类效果较为稳定。 It has been known that either linear correlation or nonlinear correlation might exist between featureto-feature and feature-to-class in datasets.In this paper,we study the differences of selected feature subset when different kinds of measures are applied with same feature selection method in different kinds of datasets.Three representative linear or nonlinear measures,linear correlation coefficient,symmetrical uncertainty,and mutual information are selected.By combining them with the fast correlation-based filter(FCBF) feature selection method,we make the comparison of selected feature subset from 8 gene microarray and image datasets.Experimental results indicate that the feature subsets selected by linear correlation coefficient based FCBF obtain better classification accuracy in gene microarray datasets than in image datasets,while mutual information and symmetrical uncertainty based FCBF tend to obtain better results in image datasets.Moreover,symmetrical uncertainty based FCBF is more robust in all datasets.
出处 《哈尔滨理工大学学报》 CAS 北大核心 2018年第1期111-116,共6页 Journal of Harbin University of Science and Technology
基金 黑龙江省普通高等学校新世纪优秀人才培养计划(1155-ncet-008) 黑龙江省教育科学规划课题(GBC1211062) 黑龙江省自然科学基金(QC2015084)
关键词 特征选择 线性相关系数 对称不确定性 互信息 基于相关性的快速特征选择方法 feature selection linear correlation coefficient symmetrical uncertainty mutual Information fast correlation-based filter
  • 相关文献

参考文献2

二级参考文献20

  • 1李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330. 被引量:45
  • 2Mitchell T M. Machine Learning. New Jersey: McGraw Hill, 1997
  • 3Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd Edition. New York: John Wiley & Sons, 2000
  • 4Rennie J D, Shih L, Teevan J, Karger D R. Tackling the poor assumptions of naive Bayes text classifiers//Proceedings of the 20th International Conference on Machine Learning. Washington DC, 2003 : 616-623
  • 5Joachims T. Text categorization with support vector machines: Learning with many relevant features//Proceedings of the 10th European Conference on Machine Learning. Chemnitz, DE, 1998:137-142
  • 6Dash M, Liu H. Feature selection for classification. International Journal of Intelligent data Analysis, 1997, 1:131-156
  • 7Kohavi R, John R C. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97 : 273-324
  • 8Das S. Filters, wrappers and a boosting-based hybrid for feature seleetion//Proceedings of the 18th International Conference on Machine Learning. Williams College, 2001:74-81
  • 9Yang Y, Pedersen J O. A comparative study on feature selection in text categorization//Proceedings of the 14th International Conference on Machine Learning. Nashville, 1997 : 412-420
  • 10Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004, 10:1205-1224

共引文献68

同被引文献75

引证文献9

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部