期刊文献+

一种基于关联信息熵度量的特征选择方法 被引量:39

Feature Selection Based on the Measurement of Correlation Information Entropy
下载PDF
导出
摘要 特征选择旨在从原始集合中选择一个规模较小的特征子集,该子集能够在数据挖掘和机器学习任务中提供与原集合近似或者更好的表现.在不改变特征物理意义的基础上,较少特征为数据提供了更强的可解读性.传统信息论方法往往将特征相关性和冗余性分割判断,无法判断整个特征子集的组合效应.将数据融合领域中的关联信息熵理论应用到特征选择中,基于该方法度量特征间的独立和冗余程度.利用特征与类别的互信息与特征对组合构建特征相关矩阵,在计算矩阵特征值时充分考虑了特征子集中不同特征间的多变量关系.提出了特征排序方法,并结合参数分析提出一种自适应的特征子集选择方法.实验结果表明所提方法在分类任务中的有效性和高效性. Feature selection aims to select a smaller feature subset from the original feature set.The subset can provide the approximate or better performance in data mining and machine learning.Without transforming physical characteristics of features,fewer features give a more powerful interpretation.Traditional information-theoretic methods tend to measure features relevance and redundancy separately and ignore the combination effect of the whole feature subset.In this paper,the correlation information entropy is applied to feature selection,which is a technology in data fusion.Based on this method,we measure the degree of the independence and redundancy among features.Then the correlation matrix is constructed by utilizing the mutual information between features and their class labels and the combination of feature pairs.Besides,with the consideration of the multivariable correlation of different features in subset,the eigenvalue of the correlation matrix is calculated.Therefore,the sorting algorithm of features and an adaptive feature subset selection algorithm combining with the parameter are proposed.Experiment results show the effectiveness and efficiency on classification tasks of the proposed algorithms.
出处 《计算机研究与发展》 EI CSCD 北大核心 2016年第8期1684-1695,共12页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61472095 61502116) 黑龙江省教育厅智能教育与信息工程重点实验室开放基金项目~~
关键词 特征选择 联合信息熵 组合效应 多变量关系 相关矩阵 feature selection correlation information entropy group effect multivariable correlation correlation matrix
  • 相关文献

参考文献2

二级参考文献37

  • 1冯翔,刘智满,帅典勋.内容分布网络缓存资源并行分配的博弈粒子场方法[J].计算机学报,2007,30(3):368-379. 被引量:5
  • 2Sun Liang,Ji Shuiwang,Ye Jieping.Multi-Label Dimensionality Reduction[M].Florida:CRC Press,2013:20-22.
  • 3Fisher R A.The use of multiple measurements in taxonomicproblems[J].Annals of Eugenics,1936,7(2):179-188.
  • 4Wold H.Estimation of principal components and related models by iterative least squares[J].Multivariate Analysis,1966,1:391-420.
  • 5Zhang Yin,Zhou Zhihua.Multi-label dimensionality reduction via dependence maximization[J].ACM Trans on Knowledge Discovery from Data(TKDD),2010,4(3):14.
  • 6Zhang Minling,Pena J M,Robles V.Feature selection formulti-label naive Bayes classification[J].Information Sciences,2009,179(19):3218-3229.
  • 7Hu Qinghua,Yu Daren,Liu Jinfu,et al.Neighborhoodrough set based heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594.
  • 8Yu Ying,Pedrycz W,Miao Duoqian.Neighborhood roughsets based multi-label classification for automatic imageannotation[J].International Journal of Approximate Reasoning,2013,54(9):1373-1387.
  • 9Yu Ying,Pedrycz W,Miao Duoqian.Multi-labelclassification by exploiting label correlations[J].Expert Systems with Applications,2014,41(6):2989-3004.
  • 10Trohidis K,Tsoumakas G,Kalliris G,et al.Multi-labelclassification of music into emotions[C]//Proc of the 9th Inl Society for Music Information Retrieval.Philadelphia:ISMIR,2008:325-330.

共引文献110

同被引文献288

引证文献39

二级引证文献213

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部