基于ReliefF剪枝的多标记分类算法被引量：9

ReliefF Based Pruning Model for Multi-Label Classification

下载PDF

导出

摘要多标记分类问题需要为每个实例分配多个标记.常见的多标记分类方法主要分为算法转换法和问题转换法两类.合理利用标记间的依赖关系是提升多标记分类性能的关键.在该文中,作者从不同的问题转化方法的角度,将标记间依赖关系的利用方法分为标记分组法和属性空间扩展法两种.作者发现,对于属性空间扩展法,普遍存在的难题在于如何对标记间的依赖关系进行准确度量,并选择合适的标记集合加入到属性空间中.在此基础上,作者提出了一种基于ReliefF剪枝的多标记分类算法(ReliefF based Stacking,RFS).算法从属性选择的角度,利用ReliefF方法对标记间的依赖关系进行度量,进而选择依赖关系较强的标记加入到原始属性空间中.在9个多标记基准数据集上的实验结果显示,RFS算法相较于当下流行的多标记分类算法具有较为明显的优势. Multi-label classification(MLC)is a machine learning problem in which models are sought that assign a subset of labels to each instance.MLC is receiving increased attention and is relevant to many domains such as text categorization,classification of music and videos,semantic annotation of images and many more.Recently,many studies are looking for efficient and accurate algorithms to cope with multi-label classification challenge.They are usually partitioned into two main categories:algorithm adaptation and problem transformation.In multi-label classification problem the labels will not occur independent of each other;instead,there are statistical dependencies between them.Nowadays,it is commonly accepted that exploiting dependencies between the labels is the key of improving the performance of multi-label classification problem.In this paper,we divide the utilizing methods of label dependency into two groups from the perspective of different ways of problem transformation:label grouping model and feature space extending model.Label grouping model normally groups labels into several label subsets based on certain strategies or criteria to incorporate label dependences.While feature space extending model usually extends the feature space of the binary classifiers to let them discover existing label dependence by themselves.We find out that the common difficulty for both kinds of models is how to accurately measure the dependences between labels.In particular,for feature space extending model,how to choose proper labels to extend the original feature space is the key to improve classification performance.On the basis of this,we propose a ReliefF based pruning model for multi-label classification(ReliefF based Stacking,RFS).RFS measures the dependencies between labels in a feature selection perspective,and then selects the more relative labels into the original feature space.And we use a stacking based algorithm during training and prediction.The key contribution of this algorithm is threefold:(1)It provides a new method to measure the dependences between labels.Unlike existing methods measuring pair-wise label dependences,our method related to the ReliefF algorithm takes into account the effect of all interacting labels.(2)Instead of extending the original feature space with all labels,we choose the closely related labels.Thus,we can reduce noise in the data and avoid adverse effects caused by irrelevant labels.(3)In the feature selection phase,we design a brand new strategy that treats original features and label features as the same features and select together.Our empirical study is divided into two parts:a systematic study on parameters of our algorithm and a comparative study between our proposal and other multi-label classification algorithms.The effects of parameters,feature selection strategies and base classifiers on RFS are discussed in the first part of experiments.In the second part,experiment results based on 6evaluating measures on 9multi-label benchmark datasets show that RFS is more effective compared to other advanced multi-label classification algorithms.

作者刘海洋王志海张志东 LIU Hai-Yang;WANG Zhi-Hai;ZHANG Zhi-Dong(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044)

机构地区北京交通大学计算机与信息技术学院

出处《计算机学报》 EI CSCD 北大核心 2019年第3期483-496,共14页 Chinese Journal of Computers

基金国家自然科学基金(61672086 61702030 61771058) 北京市自然科学基金(4182052)资助~~

关键词多标记分类标记间依赖关系属性选择 RELIEFF Stacking算法 multi-label classification label dependence feature selection ReliefF Stacking

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

同被引文献101

1张展,张宪琦,左德承,付国栋.面向边缘计算的目标追踪应用部署策略研究[J].软件学报,2020(9):2691-2708. 被引量：15
2张政馗,庞为光,谢文静,吕鸣松,王义.面向实时应用的深度学习研究综述[J].软件学报,2020(9):2654-2677. 被引量：33
3沈言言,李振.地方政府自有财力对私人部门参与PPP项目的影响及其作用机制[J].财政研究,2021(1):116-129. 被引量：8
4杨廷方,刘沛,李景禄,胡毅.FCM结合IEC三比值法诊断变压器故障[J].高电压技术,2007,33(8):66-70. 被引量：37
5蒋玉娇,王晓丹,王文军,毕凯.一种基于PCA和ReliefF的特征选择方法[J].计算机工程与应用,2010,46(26):170-172. 被引量：25
6高新波,裴继红,谢维信.模糊c-均值聚类算法中加权指数m的研究[J].电子学报,2000,28(4):80-83. 被引量：157
7许贵阳,史天运,任盛伟,韩强,王登阳.基于计算机视觉的车载轨道巡检系统研制[J].中国铁道科学,2013,34(1):139-144. 被引量：70
8刘薇.PPP模式理论阐释及其现实例证[J].改革,2015(1):78-89. 被引量：520
9李志,陈建政.基于图像处理的铁路轨枕分割方法研究[J].科技创新与应用,2015,5(11):10-11. 被引量：3
10赵晓峰.城市轨道交通列车绝对定位系统比较[J].城市轨道交通研究,2015,18(10):57-60. 被引量：7

引证文献9

1李嘉恩.基于Relief算法的激光干涉仪故障图像自动识别方法研究[J].自动化与仪器仪表,2020(12):29-32. 被引量：1
2郑英杰,吴松荣,韦若禹,涂振威,廖进,刘东.基于目标图像FCM算法的地铁定位点匹配及误报排除方法[J].浙江大学学报（工学版）,2021,55(3):586-593.
3沈俊鑫,吕佳历,程墙,张经阳.中国PPP项目可融资性差吗?——基于集成LightGBM-Blending算法[J].中国软科学,2022(1):50-61. 被引量：3
4孙林,陈雨生,徐久成.基于改进ReliefF的多标记特征选择算法[J].山东大学学报（理学版）,2022,57(4):1-11. 被引量：9
5李永豪,胡亮,高万夫.基于稀疏系数矩阵重构的多标记特征选择[J].计算机学报,2022,45(9):1827-1841. 被引量：2
6孙林,杜雯娟,李硕,徐久成.基于标记相关性和ReliefF的多标记特征选择[J].西北大学学报（自然科学版）,2022,52(5):834-846. 被引量：6
7韩晶晶,刘江越,公维军,魏宏杨,钱育蓉.面向移动端的目标检测优化研究[J].计算机工程与应用,2022,58(24):12-28. 被引量：3
8刘洋宇.基于Relief算法的智能车辆牌照模糊识别方法[J].吉林大学学报（信息科学版）,2023,41(1):158-164.
9孙林,徐枫,李硕,王振.基于ReliefF和最大相关最小冗余的多标记特征选择[J].河南师范大学学报（自然科学版）,2023,51(6):21-29. 被引量：5

二级引证文献26

1程凤伟,王文剑,张珍珍.面向高维小样本数据的层次子空间ReliefF特征选择算法[J].南京大学学报（自然科学版）,2023,59(6):928-936.
2刘涛,乔德民,晏巍,刘永炜,方铁园,赵培鹏.录井资料处理与模式识别技术在长庆油田录井解释评价中的应用[J].录井工程,2022,33(4):84-91.
3周慧颖,汪廷华,张代俐.多标签特征选择研究进展[J].计算机工程与应用,2022,58(15):52-67. 被引量：6
4廖绍雯,张蕾,贾聪.混合光照下复杂偏振光成像遮挡目标识别方法[J].激光杂志,2022,43(12):77-82.
5黎建宇,詹志辉.面向大规模特征选择的自监督数据驱动粒子群优化算法[J].智能系统学报,2023,18(1):194-206. 被引量：2
6张秀良,王凯,卜乐,赵炜.基于轻量级CNN的越障识别方法及其Android端APP实现[J].金陵科技学院学报,2022,38(4):32-37.
7孙林,徐枫,王振,徐久成.基于标记权重和mRMR的多标记特征选择[J].山西大学学报（自然科学版）,2023,46(1):40-52. 被引量：3
8孙林,张起峰,徐久成.基于互信息的Fisher Score多标记特征选择[J].南京大学学报（自然科学版）,2023,59(1):55-66. 被引量：2
9郝博,张蔚文,陈峰.上市公司参与PPP项目的融资效率研究——基于DEA模型的实证分析[J].工业技术经济,2023,42(5):132-142. 被引量：4
10余鹰,张志强,钱进,万明.基于标记补充的多标记特征选择算法[J].数据采集与处理,2023,38(3):539-548. 被引量：1

1林梦雷,刘景华,王晨曦,林耀进.基于标记权重的多标记特征选择算法[J].计算机科学,2017,44(10):289-295. 被引量：11
2周斌斌,张敏灵,刘胥影.基于三元纠错输出编码的偏标记学习算法[J].计算机科学与探索,2018,12(9):1444-1453. 被引量：2
3乔麟婷.决策树算法研究[J].课程教育研究,2018(48):224-225. 被引量：4
4黄琴,钱文彬,王映龙,吴兵龙.面向代价敏感的多标记不完备数据特征选择算法[J].小型微型计算机系统,2018,39(12):2617-2624. 被引量：5
5马鸿超,张坤丽,赵悦淑,昝红英,庄雷.基于特征融合的产科多标记辅助诊断研究[J].中文信息学报,2018,32(5):128-136. 被引量：3
6王凤华.政策演进中的独立学院转设现状及其变革路径探析[J].现代教育科学,2019(2):10-14.
7陈盼盼,林梦雷,刘景华,林国平.基于邻域粗糙集的多标记属性约简算法[J].闽南师范大学学报（自然科学版）,2018,31(4):1-11. 被引量：1
8张勇.有效发挥初中数学课后作业在学生学习中的作用[J].明日,2019,0(18):116-116.
9李强,翟亮.基于Stacking算法的员工离职预测分析与研究[J].重庆工商大学学报（自然科学版）,2019,36(1):117-123. 被引量：9
10刘平平,张文华,卢振泰,陈韬,李国新.基于放射组学特征的胃肠道间质瘤的分类预测[J].计算机科学,2019,46(1):285-290. 被引量：7

计算机学报

2019年第3期

浏览历史

内容加载中请稍等...

基于ReliefF剪枝的多标记分类算法被引量：9

同被引文献101

引证文献9

二级引证文献26

相关作者

相关机构

相关主题

浏览历史

基于ReliefF剪枝的多标记分类算法 被引量：9

同被引文献101

引证文献9

二级引证文献26

相关作者

相关机构

相关主题

浏览历史

基于ReliefF剪枝的多标记分类算法被引量：9