摘要
特征选择作为多标记学习任务中关键预处理步骤,能够有效地解决高维多标记数据存在的维度灾难问题。在现有大部分的多标记学习中,标记是以逻辑分布的形式刻画,即示例中相关标记的重要性相同;然而,在许多现实生活中,每个示例的标记重要程度呈现差异性。本文提出了一种基于模糊相似性的标记增强算法,通过衡量示例中标记的模糊相关性,将传统的多标记数据转换为标记分布数据;分析了标记分布数据中在标记上的标记差异性和在特征上的模糊相对辨识关系,给出了在标记空间和特征空间上的模糊辨识度,并构造了衡量特征辨识能力的特征重要度;在此基础上,构建面向标记分布数据的特征选择算法,能获得按特征重要度降序的特征选择结果。最后通过在多个多标记数据集上实验对比和分析,进一步验证了算法的有效性和可行性。
Feature selection is the key pre-processing step of multi-label learning tasks.It can efficiently solve the problem of the"curse of dimensionality",which is existed in the high-dimensional multi-label data.In multi-label learning,the label is described as the form of logical distribution,in which the importance of each label associated with the instance is equivalent.However,the label importance of each label is usually different in many fields.For this issue,a label enhancement algorithm is proposed in this paper.By evaluating the fuzzy similarity relation on labels among instances,it transforms the multi-label data to the label distribution data.The discernibility relation on labels and the fuzzy relative discernibility relation on features are analyzed in details for label distribution data,then the fuzzy discernibility on the label space and the feature space is defined,and the significance of feature is constructed to assess the discernibility ability of the feature.On this basis,a feature selection algorithm is proposed for label distribution data,which can obtain the result of feature selection in descending order of feature significance.Finally,the experimental results show that the proposed algorithm is effective and feasible on several multi-label datasets.
作者
熊传镇
钱文彬
王映龙
XIONG Chuanzhen;QIAN Wenbin;WANG Yinglong(School of Computer and Information Engineering,Jiangxi Agricultural University,Nanchang 330045,China;School of Software,Jiangxi Agricultural University,Nanchang 330045,China)
出处
《数据采集与处理》
CSCD
北大核心
2021年第3期529-543,共15页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(61966016)资助项目
江西省自然科学基金(20192BAB207018)资助项目
江西教育厅科学技术研究基金(GJJ180200)资助项目。
关键词
特征选择
粒计算
粗糙集
标记分布
模糊辨识度
feature selection
granular computing
rough set
label distribution
fuzzy discernibility