期刊文献+

基于标签关系改进的多标签特征选择算法 被引量:2

Multi-label Feature Selection Algorithm Based on Improved Label Correlation
下载PDF
导出
摘要 多标签特征选择是应对数据维度灾难现象的主要方法之一,可以在降低特征维度的同时提高学习效率,优化分类性能。针对目前特征选择算法没有考虑标签间的相互关系,以及信息量的衡量范围存在偏差的问题,提出一种基于标签关系改进的多标签特征选择算法。首先引入对称不确定性对信息量进行归一化处理,然后用归一化的互信息量作为相关性的衡量方法,并据此定义标签的重要性权重,对依赖度和冗余度中的标签相关项进行加权处理;进而提出一种特征评分函数作为特征重要性的评价指标,并依次选择出评分最高的特征组成最佳特征子集。实验结果表明,与其他算法相比,该算法在提取出更加精确的低维特征子集后,不仅能够有效提高面向实体信息挖掘的多标签学习算法的性能,也能提高基于离散特征的多标签学习算法的效率。 Multi-label feature selection is one of the essential methods to overcome the curse of dimensionality.It reduces the feature dimension,improves the learning efficiency,and optimizes the classification performance.However,many existing feature selection algorithms hardly take label correlation into consideration,and the range of information entropies are biased within different data sets.To address those problems,this paper proposed a multi-label feature selection algorithm based on the improved label correlation.The algorithm firstly uses symmetrical uncertainty to normalize the information entropy,and takes normalized mutual information as relationship measurement to define the label importance,with which the label-related items in dependency and redundancy are weighted.In the end,the score function is put forward to evaluate the feature importance,and the best feature subset is selected with the highest score.Experiments demonstrate that after selecting out the concise and accurate feature subset,the multi-label classification is accelerated in terms of the performance and the efficiency with disperse features.
作者 陈福才 李思豪 张建朋 黄瑞阳 CHEN Fu-cai;LI Si-hao;ZHANG Jian-peng;HUANG Rui-yang(National Digital Switching System Engineering and Technological R&D Cente)
出处 《计算机科学》 CSCD 北大核心 2018年第6期228-234,共7页 Computer Science
基金 国家重点研发计划项目(2016YFB0800101) 国家自然科学基金创新研究群体项目(61521003)资助
关键词 多标签特征选择 标签关系 依赖度 冗余度 特征评分 Multi-label feature selection Label correlation Dependency Redundancy;Feature score
  • 相关文献

参考文献4

二级参考文献101

  • 1王学伟,瞿海斌,王阶.一种基于数据挖掘的中医定量诊断方法[J].北京中医药大学学报,2005,28(1):4-7. 被引量:40
  • 2李国春,李春婷,黄蓝平,单兆伟,陈启光.结构方程模型探讨慢性萎缩性胃炎证候分型规律[J].南京中医药大学学报,2006,22(4):217-220. 被引量:27
  • 3毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量:95
  • 4Tsoumakas G, Katakis I, Vlahavas I. Data Mining and Knowledge Discovery Handbook [M]. Berlin: Springer, 2010:667-685.
  • 5Zhang Y, Zhou Z H. Multi label dimensionality reduction via dependence maximization [C] // Proe of the 2Srd AAAI Conf on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference. Menlo Park~ American Association for Artificial Intelligence, 2008: 150:3-1505.
  • 6Li G Z, You M, Ge L, et al. Feature selection for semi- supervised multi label learning with application to gene function analysis [C] // Proc of the 2010 ACM Int Conf on Bioinformatics and Computational Biology. New York: Association for Computing Machinery, 2010:354-357.
  • 7You M Y, Liu J M, Li G Z, et al. Embedded feature selection for multi-label classification of music emotions [J]. International Journal of Computational Intelligence Systems, 2012, 5(4): 668-678.
  • 8Shao H. H G. l.iu G, et al. lahel data of inquiry diagnosis Symptom selection for multi n traditional Chinese medicioe [J]. Science China Information Sciences, 2012, 54(1): 1-13.
  • 9Lee J, I.im H, Kim D W. Approximating mutual information for multi label feature selection [J].Electronics Le'tters, 2012, 48(15): 929-930.
  • 10Zhang M I., Pena J M, Rohles V. Feature selection for muhi-lahel naive Bayes classification [J].Information Seienees, 2009, 179( 19): 3218-3229.

共引文献135

同被引文献27

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部