摘要
多标签学习是机器学习领域的一个研究热点,其能够有效解决真实世界中的多语义问题。在多标签学习任务中,样本的多个标签之间存在一定的关联关系,忽略标签间的相关性会导致模型的泛化性能降低。提出一种基于标签间相关性的多标签学习K近邻算法。充分挖掘样本多标签间的相关性,通过Fp_growth算法得到标签的频繁项集。针对频繁项和标签分别构建评分模型和阈值模型,评分模型用于衡量样本与频繁项或标签之间的关联程度,阈值模型用于求解频繁项或标签对应的判别阈值,结合评分模型和阈值模型对样本所属频繁项进行预测,进而确定样本标签集。在经典数据集Emotions和Scene上的实验结果表明,该算法的F1-Measure指标分别达到66.6%和73.3%,相比CC、LP、RAKEL、MLDF等基准方法,其F1-Measure分别平均提高3.8和2.1个百分点,该算法通过合理利用标签间的相关性使得分类性能得到有效提升。
Multi-label learning is a popular research topic in the field of machine learning.It can effectively solve multilingualism in the real world.In multi-label learning,a certain correlation exists between multiple labels of the sample.Ignoring the correlation between labels reduces the generalization performance of the model.Concerning multi-label learning,a multilabel learning,K-nearest neighbor algorithm based on the correlation between labels is proposed to fully excavate the correlation between multiple labels of samples,using the Fp_growth algorithm to obtain the frequent item-sets of tags.For frequent items and labels,the scoring and threshold models are constructed.The scoring model measures the correlation between the sample and frequent items or labels.The threshold model solves the discrimination threshold corresponding to frequent items or labels.Combining these models,the frequent items of the sample are predicted,and the sample label set is then determined.The results on the classical Emotions and Scene datasets show that the F1-Measure index of the algorithm achieved 66.6% and 73.3%,respectively.Compared with benchmark methods,such as CC,LP,RAKE,and MLDF,the F1-Measure of the algorithm improved by an average of 3.8 and 2.1 percentage points.The algorithm effectively improves the classification performance by rationally using the correlation between labels.
作者
钱龙
赵静
韩京宇
毛毅
QIAN Long;ZHAO Jing;HAN Jingyu;MAO Yi(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第6期73-78,88,共7页
Computer Engineering
基金
国家自然科学基金(62002174)。
关键词
机器学习
多标签学习
标签相关性
K近邻
频繁项集
machine learning
multi-label learning
label correlation
K-nearest neighbor
frequent item-sets