期刊文献+

基于Fisher score与模糊邻域熵的多标记特征选择算法 被引量:2

Multilabel feature selection algorithm based on Fisher score and fuzzy neighborhood entropy
下载PDF
导出
摘要 针对Fisher score未充分考虑特征与标记以及标记之间的相关性,以及一些邻域粗糙集模型容易忽略边界域中知识粒的不确定性,导致算法分类性能偏低等问题,提出一种基于Fisher score与模糊邻域熵的多标记特征选择算法(MLFSF)。首先,利用最大信息系数(MIC)衡量特征与标记之间的关联程度,构建特征与标记关系矩阵;基于修正余弦相似度定义标记关系矩阵,分析标记之间的相关性。其次,给出一种二阶策略获得多个二阶标记关系组,以此重新划分多标记论域;通过增强标记之间的强相关性和削弱标记之间的弱相关性得到每个特征的得分,进而改进Fisher score模型,对多标记数据进行预处理。再次,引入多标记分类间隔,定义自适应邻域半径和邻域类并构造了上、下近似集;在此基础上提出了多标记粗糙隶属度函数,将多标记邻域粗糙集映射到模糊集,基于多标记模糊邻域给出了上、下近似集以及多标记模糊邻域粗糙集模型,由此定义模糊邻域熵和多标记模糊邻域熵,有效度量边界域的不确定性。最后,设计基于二阶标记相关性的多标记Fisher score特征选择算法(MFSLC),从而构建MLFSF。在多标记K近邻(MLKNN)分类器下11个多标记数据集上的实验结果表明,相较于ReliefF多标记特征选择(MFSR)等6种先进算法,MLFSF的平均分类精度(AP)的均值提高了2.47~6.66个百分点;同时,在多数数据集上,MLFSF在5个评价指标上均能取得最优值。 For that Fisher score model does not fully consider feature-label and label-label relations,and some neighborhood rough set models easily neglect the uncertainty of knowledge granulations in the boundary region,resulting in the low classification performance of these algorithms,a MultiLabel feature selection algorithm based on Fisher Score and Fuzzy neighborhood entropy(MLFSF)was proposed.Firstly,by using the Maximum Information Coefficient(MIC)to evaluate the feature-label association degree,the relationship matrix between features and labels was constructed,and the correlation between labels was analyzed by the relationship matrix of labels based on the adjusted cosine similarity.Secondly,a second-order strategy was given to obtain multiple second-order label relationship groups to reclassify the multilabel domain,where the strong correlation between labels was enhanced and the weak correlation between labels was weakened to obtain the score of each feature.The Fisher score model was improved to preprocess the multilabel data.Thirdly,the multilabel classification margin was introduced to define the adaptive neighborhood radius and neighborhood class,and the upper and lower approximation sets were constructed.On this basis,the multilabel rough membership degree function was presented,and the multilabel neighborhood rough set was mapped to the fuzzy set.Based on the multilabel fuzzy neighborhood,the upper and lower approximation sets and the multilabel fuzzy neighborhood rough set model were developed.Thus,the fuzzy neighborhood entropy and the multilabel fuzzy neighborhood entropy were defined to effectively measure the uncertainty of the boundary region.Finally,the Multilabel Fisher Score-based feature selection algorithm with second-order Label Correlation(MFSLC)was designed,and then the MLFSF was constructed.The experimental results applied to 11 multilabel datasets with the Multi-Label K-Nearest Neighbor(MLKNN)classifier show that when compared with six state-of-the-art algorithms including the Multilabel Feature Selection algorithm based on improved ReliefF(MFSR),MLFSF improves the mean of Average Precision(AP)by 2.47 to 6.66 percentage points;meanwhile,MLFSF obtains optimal values for all five evaluation metrics on most datasets.
作者 孙林 马天娇 薛占熬 SUN Lin;MA Tianjiao;XUE Zhan’ao(College of Artificial Intelligence,Tianjin University of Science&Technology,Tianjin 300457,China;College of Computer and Information Engineering,Henan Normal University,Xinxiang Henan 453007,China;Engineering Lab of Intelligence Business&Internet of Things of Henan Province(Henan Normal University),Xinxiang Henan 453007,China)
出处 《计算机应用》 CSCD 北大核心 2023年第12期3779-3789,共11页 journal of Computer Applications
基金 国家自然科学基金资助项目(62076089,61976082)。
关键词 多标记学习 特征选择 Fisher score 多标记模糊邻域粗糙集 模糊邻域熵 multilabel learning feature selection Fisher score multilabel fuzzy neighborhood rough set fuzzy neighborhood entropy
  • 相关文献

参考文献15

二级参考文献172

  • 1陈振宇,刘金波,李晨,季晓慧,李大鹏,黄运豪,狄方春,高兴宇,徐立中.基于LSTM与XGBoost组合模型的超短期电力负荷预测[J].电网技术,2020,44(2):614-620. 被引量:201
  • 2徐袭,石敏.一种基于粗糙集与小波变换的电能质量分类方法[J].电力自动化设备,2005,25(11):15-18. 被引量:8
  • 3毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量:94
  • 4Sun Liang,Ji Shuiwang,Ye Jieping.Multi-Label Dimensionality Reduction[M].Florida:CRC Press,2013:20-22.
  • 5Fisher R A.The use of multiple measurements in taxonomicproblems[J].Annals of Eugenics,1936,7(2):179-188.
  • 6Wold H.Estimation of principal components and related models by iterative least squares[J].Multivariate Analysis,1966,1:391-420.
  • 7Zhang Yin,Zhou Zhihua.Multi-label dimensionality reduction via dependence maximization[J].ACM Trans on Knowledge Discovery from Data(TKDD),2010,4(3):14.
  • 8Zhang Minling,Pena J M,Robles V.Feature selection formulti-label naive Bayes classification[J].Information Sciences,2009,179(19):3218-3229.
  • 9Hu Qinghua,Yu Daren,Liu Jinfu,et al.Neighborhoodrough set based heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594.
  • 10Yu Ying,Pedrycz W,Miao Duoqian.Neighborhood roughsets based multi-label classification for automatic imageannotation[J].International Journal of Approximate Reasoning,2013,54(9):1373-1387.

共引文献192

同被引文献22

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部