摘要
在多标记学习的任务中,多标记学习的每个样本可被多个标签标记,比单标记学习的应用空间更广关注度更高,多标记学习可以利用关联性提高算法的性能。在多标记学习中,传统特征选择算法已不再适用,一方面,传统的特征选择算法可被用于单标记的评估标准。多标记学习使得多个标记被同时优化;而且在多标记学习中关联信息存在于不同标记间。因此,可设计一种能够处理多标记问题的特征选择算法,使标记之间的关联信息能够被提取和利用。通过设计最优的目标损失函数,提出了基于指数损失间隔的多标记特征选择算法。该算法可以通过样本相似性的方法,将特征空间和标记空间的信息融合在一起,独立于特定的分类算法或转换策略。优于其他特征选择算法的分类性能。在现实世界的数据集上验证了所提算法的正确性以及较好的性能。
In multi-label learning tasks,each sample can be associated with multiple labels at the same time,it has a wider application space than the single-label learning problem.Multi-label learning problem employs correlation information to improve the performance of the algorithm.In the multi-labeling process,the traditional feature selection algorithm is no longer applicable.On the one hand,they are generally designed to evaluate criteria for single markers.In the multi-label learning,it is necessary to optimize multiple tags at the same time;on the other hand,there is a certain amount of associated information between different tags in multi-label learning.Therefore,it is necessary to design a feature selection algorithm capable of handling multi-labeling problems,which is capable of extracting and utilizing association information between labels.An improved multi-label feature selection algorithm based on exponential loss margin is proposed,benefiting from the large margin based multi-label feature selection algorithm.The algorithm combines the information of feature space with mark space according to sample similarity.The correlation information is also independent of the specific classification algorithm or transformation strategy.The experiments on real world datasets demonstrate the correctness and high performance of the proposed algorithm.
作者
李雨婷
LI Yu-ting(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《计算机技术与发展》
2020年第4期46-51,共6页
Computer Technology and Development
基金
国家自然科学基金(61603197,61772284,61876091)。
关键词
多标记学习
特征选择
分类间隔
指数损失
multi-label learning
feature selection
margin
exponential loss