摘要
多标记特征选择是机器学习和人工智能领域的研究热点之一,现有多标记学习的研究是假设每个示例的标记呈均匀分布,即每个示例的各个相关标记的重要程度相同.然而,在许多应用领域中这些相关标记的重要程度往往不同.为此,本文提出了一种标记增强方法,可将多标记数据中传统的逻辑标记转化为监督信息更丰富的标记分布;同时,从代价敏感学习视角,构造了基于特征代价与特征依赖度的特征重要性度量准则,在此基础上,设计了面向标记分布数据的代价敏感特征选择算法;最后,通过在真实的多标记数据集上的实验对比与分析,验证了算法的有效性和可行性.
Multi-label feature selection is one of the research issues in the fields of machine learning and artificial intelligence.Existing researches on multi-label learning assume that the related labels of every instance are uniformly distributed,i.e.,the significance of different related labels of every instance is equivalent.However,the significance of related labels tends to be different in the real-word applications.To this end,this paper proposes a label enhancement method,which can transform the traditional logic distribution in multi-label data into more comprehensive label distribution.Then,based on the cost-sensitive learning perspective,a metric of feature significance is constructed using the feature cost and feature dependency,simultaneously.On this basis,a cost-sensitive feature selection algorithm for label-distributed data is designed.Finally,the effectiveness and feasibility of the algorithm are verified by experimental comparison and analysis on real multi-label data sets.
作者
黄锦涛
钱文彬
王映龙
HUANG Jin-tao;QIAN Wen-bin;WANG Ying-long(School of Computer and Information Engineering,Jiangxi Agricultural University,Nanchang 330045,China;School of Software,Jiangxi Agricultural University,Nanchang 330045,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2020年第4期685-691,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61966016,61502213)资助
江西省自然科学基金项目(20192BAB207018)资助
江西省教育厅科学技术研究项目(GJJ180200)资助
江西省研究生创新专项基金项目(YC2018-S192)资助。
关键词
特征选择
粗糙集
属性约简
多标记学习
代价敏感
标记增强
feature selection
rough sets
attribute reduction
multi-label learning
cost-sensitive
label enhancement