摘要
特征选择是多标记学习中重要的预处理过程.针对现有多标记分类方法没有考虑标记占比对特征和标记相关性的影响,以及不能有效处理弱标记数据等问题,提出一种基于仿射传播(affinity propagation,AP)聚类和互信息的弱标记特征选择方法.首先,在AP聚类的基础上,结合剩余标记信息和样本相似性,构建概率填补公式,预测缺失标记值,有效补齐缺失标记;然后,使用先验概率定义标记占比,结合互信息构建相关性度量,评估特征与标记集之间的相关程度;最后,设计一种弱标记特征选择算法,有效提高弱标记数据的分类性能.在6个多标记数据集上进行仿真实验,结果表明,该算法在多个指标上获得了良好的分类性能,优于当前多种相关的多标记特征选择算法,有效验证了所提算法的有效性.
Feature selection is an important preprocessing process in multi-label learning.To address the issues that some multi-label classification methods do not consider the influence of the proportion of label on the correlation between features and label sets and cannot efficiently deal with weak label data,a weak label feature selection method based on affinity propagation(AP)clustering and mutual information was proposed.Firstly,to effectively fill in all missing labels,the combination of the remaining label information with the similarity of samples was performed based on AP clustering,and then a probability filling formula was constructed to predict the values of missing labels.Secondly,the prior probability was used to define the proportion of label,which was combined with mutual information to develop the correlation metric for evaluating the correlation degree between features and label sets.Finally,a weak label feature selection algorithm was designed to effectively improve the classification performance of the weak label data.The simulation experimental results and analysis under six multi-label datasets show that the algorithm achieves better classification performance on multiple metrics and is superior to many related multi-label feature selection algorithms at present.All these can verify the effectiveness of the proposed algorithm.
作者
孙林
施恩惠
司珊珊
徐久成
Sun Lin;Shi Enhui;Si Shanshan;Xu Jiucheng(College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China)
出处
《南京师大学报(自然科学版)》
CAS
CSCD
北大核心
2022年第3期108-115,共8页
Journal of Nanjing Normal University(Natural Science Edition)
基金
国家自然科学基金项目(62076089、61772176、61976082)
河南省科技攻关项目(212102210136)。
关键词
多标记学习
特征选择
AP聚类
互信息
缺失标记
multi-label learning
feature selection
AP clustering
mutual information
missing labels