期刊文献+

面向二类区分能力的干扰熵特征选择方法

Interference entropy feature selection method for two-class distinguishing ability
下载PDF
导出
摘要 针对现有的特征选择方法对衡量不同类别数据重叠/分离能力的不足,提出了一种用于评价特征的二类区分能力的干扰熵方法(IET-CD)。对于包含两个类别(正类和负类)样本的特征,首先,计算正类数据范围内的负类样本的混合条件概率,以及负类样本归属于正类的概率;然后,由混合条件概率和归属概率计算混淆概率,再利用混淆概率计算正类干扰熵,同理,计算负类干扰熵;最后,将正、负类干扰熵之和作为该特征的二类干扰熵。干扰熵用于评价特征对二类样本的区分能力,该特征的干扰熵值小,表明该特征的二类区分能力强,反之则弱。在3个UCI数据集和1个模拟基因表达数据集上,每个方法挑选出5个最优特征,并对比了这些特征的二类区分能力,由此比较这些方法的性能。实验结果表明:所提方法与NEFS方法相比,二类区分能力相当或更好;与单索引近邻熵特征选择(SNEFS)方法、相关性最大冗余性最小特征选择(MRMR)算法、联合互信息(JMI)方法、Relief方法相比,绝大多数情况都是所提方法获胜。IET-CD方法能有效地选择二类区分能力更好的特征。 Aiming at the existing feature selection methods lacking the ability to measure the overlap/separation of different classes of data,an Interference Entropy of Two-Class Distinguishing(IET-CD)method was proposed to evaluate the two-class distinguishing ability of features.For the feature containing two classes(positive and negative),firstly,the mixed conditional probability of the negative class samples within the range of positive class data and the probability of the negative class samples belonging to the positive class were calculated;then,the confusion probability was calculated by the mixed conditional probability and attribution probability,and the confusion probability was used to calculate the positive interference entropy.In the similar way,the negative interference entropy was calculated.Finally,the sum of positive and negative interference entropies was taken as the two-class interference entropy of the feature.The interference entropy was used to evaluate the distinguishing ability of the feature to the two-class sample.The smaller the interference entropy value of the feature,the stronger the two-class distinguishing ability of the feature.On three UCI datasets and one simulated gene expression dataset,five optimal features were selected for each method,and the two-class distinguishing ability of the features were compared,so as to compare the performance of the methods.The experimental results show that the proposed method is equivalent or better than the NEFS(Neighborhood Entropy Feature Selection)method,and compared with the Single-indexed Neighborhood Entropy Feature Selection(SNEFS),feature selection based on Max-Relevance and MinRedundancy(MRMR),Joint Mutual Information(JMI)and Relief method,the proposed method is better in most cases.The IET-CD method can effectively select features with better two-class distinguishing ability.
作者 曾元鹏 王开军 林崧 ZENG Yuanpeng;WANG Kaijun;LIN Song(College of Mathematics and Informatics,Fujian Normal University,Fuzhou Fujian 350007,China;Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring,Fujian Normal University,Fuzhou Fujian 350007,China)
出处 《计算机应用》 CSCD 北大核心 2020年第3期626-630,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(61672157,61772134) 福建省自然科学基金资助项目(2018J01778) 中国博士后科学基金资助项目(2016M600494)~~
关键词 特征选择 二类区分能力 条件概率 干扰熵 feature selection two-class distinguishing ability conditional probability interference entropy
  • 相关文献

参考文献8

二级参考文献34

  • 1Tsoumakas G, Katakis I, Vlahavas I. Data Mining and Knowledge Discovery Handbook [M]. Berlin: Springer, 2010:667-685.
  • 2Zhang Y, Zhou Z H. Multi label dimensionality reduction via dependence maximization [C] // Proe of the 2Srd AAAI Conf on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference. Menlo Park~ American Association for Artificial Intelligence, 2008: 150:3-1505.
  • 3Li G Z, You M, Ge L, et al. Feature selection for semi- supervised multi label learning with application to gene function analysis [C] // Proc of the 2010 ACM Int Conf on Bioinformatics and Computational Biology. New York: Association for Computing Machinery, 2010:354-357.
  • 4You M Y, Liu J M, Li G Z, et al. Embedded feature selection for multi-label classification of music emotions [J]. International Journal of Computational Intelligence Systems, 2012, 5(4): 668-678.
  • 5Shao H. H G. l.iu G, et al. lahel data of inquiry diagnosis Symptom selection for multi n traditional Chinese medicioe [J]. Science China Information Sciences, 2012, 54(1): 1-13.
  • 6Lee J, I.im H, Kim D W. Approximating mutual information for multi label feature selection [J].Electronics Le'tters, 2012, 48(15): 929-930.
  • 7Zhang M I., Pena J M, Rohles V. Feature selection for muhi-lahel naive Bayes classification [J].Information Seienees, 2009, 179( 19): 3218-3229.
  • 8Park C H, Lee M.On applying linear discriminant analysis for multi-labeled problems [J]. Pattern Recognition I.etters, 2008, 29(7) : 878-887.
  • 9Yu K. Yu S, Tresp V. Multi label informed latent semantic indexing[C]/ Proc of the 28th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2005:258-265.
  • 10Ji S, Ye J. Linear dimensionality reduction for multi label classification [C] // Proe of the 21st Int Joint Conf on Artifieial Intelligence. San Francisco: Morgan Kaufmann, 2009:1077-1082.

共引文献139

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部