摘要
为了充分地挖掘被单一度量指标算法忽略但对分类结果有利的特征,提出了基于次相关特征和邻域互信息的在线多标记特征选择算法,通过计算得到的新到达特征的重要性以及相关度,分析其显著性的区别,将特征区分为显著特征以及次相关特征。利用邻域交互信息对新到达的特征与已选特征集合进行冗余性分析,剔除依赖度较低的特征,以此逐步提升特征子集的质量。构建了基于全局的线性和非线性关系的度量指标,并以此来计算特征的局部相关度,有效地挖掘次相关特征。充分考虑特征空间中次相关特征存在的问题,将次相关特征从特征集合中剥离并单独保存,使之在冗余分析阶段不会因显著特征对度量指标敏感度高所产生的影响而被剔除出特征集合。建立了特征选择指标,利用迭代策略根据指标进行特征选择。实验结果表明,该算法具有很好的有效性和稳定性。
To fully mine the features neglected by the single metric algorithm but beneficial to the classifier,this paper proposes an online multi⁃label feature selection algorithm based on sub⁃correlation features and neighborhood mutual information.By calculating the importance and correlation of newly arrived features,the difference between the significance of new features is analyzed,and the features are divided into salient features and sub⁃correlation features.Redundancy analysis is performed on newly arrived features and selected feature sets using neighborhood interaction information,and features with low dependencies are eliminated,to gradually improve the quality of feature subsets.This paper also constructs a measurement index based on the global linear and nonlinear relationship and uses it to calculate the local correlation of features,effectively mining the sub⁃correlation features.Strip the sub⁃correlation features from the feature set and save them separately,so that they will not be eliminated from the feature set during the redundancy analysis stage due to the high sensitivity of the salient features to the measurement index.Using established feature selection indicators and iterative strategies to select features according to the indicators.Experimental results show that the proposed algorithm has good effectiveness and stability.
作者
程雨轩
毛煜
张小清
曾艺祥
林耀进
CHENG Yuxuan;MAO Yu;ZHANG Xiaoqing;ZENG Yixiang;LIN Yaojin(School of Computer Science,Minnan Normal University,Zhangzhou 363000,Fujian,China;Key Laboratory of Data Science and Intelligence Application,Minnan Normal University,Zhangzhou 363000,Fujian,China)
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2024年第5期70-81,共12页
Journal of Shandong University(Natural Science)
基金
福建省自然科学基金资助项目(2022J01914)。
关键词
在线特征选择
多标记学习
邻域熵
邻域互信息
次相关特征
online feature selection
multi⁃label learning
neighborhood entropy
neighborhood mutual information
sub⁃correlation feature