弱监督场景下的支持向量机算法综述

Survey on Support Vector Machine Algorithms in Weakly Supervised Scenarios

下载PDF

导出

摘要支持向量机(Support Vector Machine,SVM)是一种建立在结构风险最小化原则上的统计学习方法,以其在非线性、小样本以及高维问题中的独特优势被广泛应用于图像识别、故障诊断以及文本分类等领域.但SVM是一种监督学习算法,它旨在利用大量的、唯一且明确的真值标记样本来训练学习器,在不完全监督、不确切监督以及多义监督等弱监督场景下难以取得较好的效果.本文首先阐述了弱监督场景的概念和SVM的相关理论,然后从弱监督场景角度出发,系统地梳理了目前SVM算法的研究现状和发展,包括基于半监督学习、多示例学习以及多标记学习的方法;其中基于半监督学习的方法根据数据假设可细分为基于聚类假设和基于流形假设的方法,基于多标记学习的方法根据解决方案可细分为基于示例水平空间、基于包水平空间以及基于嵌入空间的方法,基于多标记学习的方法根据处理思路可细分为基于问题转换和基于算法自适应的方法;随后,本文总结了部分代表性算法在公开数据集上的实验结果;最后,探讨并展望了未来可能的研究方向. Support Vector Machine(SVM)is a statistical learning method based on the principle of minimizing structural risk.It provides an intuitive geometric interpretation and rigorous math-ematical derivation,showing the unique advantages in handling nonlinear,few shot,and high dimensional problems.SVM has garnered significant attention and widely applied in various fields such as image recognition,fault diagnosis,and text classification.SVM is a classical supervised machine learning algorithm designed to train the learner using samples with complete,unique,and unambiguous ground-truth labels to ensure the generalization ability.However,as real-world application tasks become increasingly complex,creating such a sample set is laborious and difficult.On the one hand,it requires a significant amount of time and cost for data collection,cleaning,and debugging.For specific domains,especially in the medical field,experts often need to combine domain knowledge to process and label the samples.On the other hand,learning tasks in the real world often undergo changes and evolution.For example,data annotation criteria,annota-tion granularity,or downstream use cases may frequently change,requiring the re-labeling of sam-ples.Consequently,a large amount of samples in real-world applications lack complete and unambig-uous labels for the high cost of sample labeling.Moreover,samples in most practical task scenari-os may exhibit polysemous,that is,a sample can be associated with multiple labels at the same time.Therefore,standard SVM struggles to achieve satisfactory performance in weakly supervised scenarios such as incomplete supervision,inexact supervision,and polysemous supervision.Weakly supervised scenarios are contrasted with supervised scenarios.Unlike the latter,learning algorithms in weakly supervised scenarios are designed to train the learner using samples that may be limited,ambiguous,or only roughly labeled.From the perspective of weakly supervised sce-narios,this survey systematically reviews the current research status and development of SVM algorithms.Firstly,the concept of weakly supervised scenarios and the basic mathematical prin-ciple of SVM are briefly introduced.Secondly,the existing SVM algorithms in weakly supervised scenarios are divided into three categories according to different learning paradigms,namely,the semi-supervised learning based methods,the multiple instance learning based methods,and the multi-label learning based methods.Specifically,the semi-supervised learning based methods can be further subdivided into clustering assumption based approaches and manifold assumption based approaches according to data assumptions.The multiple instance learning based methods can be further classified into instance level based approaches,bag level based approaches and embedded space based approaches according to problem solutions.The multi-label learning based methods can be further refined into problem transformation based approaches and algorithm adaptation based approaches according to processing ideas.This survey provides a detailed introduction to the repre-sentative methods within these categories,summarizes and analyzes their characteristics and short-comings,offering a basis for selecting different SVM methods in various task scenarios.After that,the performance of some representative algorithms is evaluated and analyzed by carefully conducting experiments on publicly available datasets.Finally,potential research directions for the future development of SVM algorithms in weakly supervised scenarios are discussed,such as data imbalance,weakly supervised regression,mixed weakly supervised learning,large-scale deep-level tasks and learning problems for open enviroment.

作者丁世飞孙玉婷梁志贞郭丽丽张健徐晓 DING Shi-Fei;SUN Yu-Ting;LIANG Zhi-Zhen;GUO Li-Li;ZHANG Jian;XU Xiao(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116;Mine Digitization Engineering Research Center of the Ministry of Education(China University of Mining and Technology),Xuzhou,Jiangsu 221116)

机构地区中国矿业大学计算机科学与技术学院矿山数字化教育部工程研究中心(中国矿业大学)

出处《计算机学报》 EI CAS CSCD 北大核心 2024年第5期987-1009,共23页 Chinese Journal of Computers

基金国家自然科学基金(62276265,61976216,62206297,62206296)资助.

关键词弱监督场景支持向量机半监督学习多示例学习多标记学习 weakly supervised scenarios support vector machine(SVM) semi-supervised learning multiple instance learning multi-label learning

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1安悦瑄,丁世飞,胡继普.孪生支持向量机综述[J].计算机科学,2018,45(11):29-36. 被引量：13
2刘振,杨俊安,刘辉,王伟.基于局部行为相似性的拉普拉斯SVM半监督学习算法[J].小型微型计算机系统,2016,37(12):2749-2754. 被引量：1
3陶新民,曹盼东,宋少宇,付丹丹.基于半监督高斯混合模型核的支持向量机分类算法[J].信息与控制,2013,42(1):18-26. 被引量：5
4李涛,汪西莉.一种基于聚类核的半监督支持向量机分类方法[J].计算机应用研究,2013,30(1):42-45. 被引量：6
5黄华,郑佳敏,钱鹏江.调整聚类假设联合成对约束半监督分类方法[J].计算机应用,2018,38(11):3119-3126. 被引量：2
6田勋,汪西莉.基于聚类标签均值的半监督支持向量机[J].计算机工程与科学,2018,40(12):2265-2272. 被引量：3
7刘建伟,刘媛,罗雄麟.半监督学习方法[J].计算机学报,2015,38(8):1592-1617. 被引量：131
8张敏灵,吴璇.非消歧偏标记学习[J].中国科学：信息科学,2019,0(9):1083-1096. 被引量：3
9丁世飞,齐丙娟,谭红艳.支持向量机理论与算法研究综述[J].电子科技大学学报,2011,40(1):2-10. 被引量：908
10zhi-hua zhou.A brief introduction to weakly supervised learning[J].National Science Review,2018,5(1):44-53. 被引量：104

二级参考文献73

1全勇,杨杰.Geodesic Distance for Support Vector Machines[J].自动化学报,2005,31(2):202-208. 被引量：4
2李道国,苗夺谦,张东星,张红云.粒度计算研究综述[J].计算机科学,2005,32(9):1-12. 被引量：54
3孔锐,张冰.一种快速支持向量机增量学习算法[J].控制与决策,2005,20(10):1129-1132. 被引量：31
4李颖新,阮晓钢.基于支持向量机的肿瘤分类特征基因选取[J].计算机研究与发展,2005,42(10):1796-1801. 被引量：51
5李东晖,杜树新,吴铁军.基于壳向量的线性支持向量机快速增量学习算法[J].浙江大学学报（工学版）,2006,40(2):202-206. 被引量：16
6林开标,王周敬.基于支持向量机的传真收件人识别方法[J].计算机工程与应用,2006,42(7):156-158. 被引量：3
7孔波,刘小茂,张钧.基于中心距离比值的增量支持向量机[J].计算机应用,2006,26(6):1434-1436. 被引量：16
8张翔,肖小玲,徐光祐.模糊支持向量机中隶属度的确定与分析[J].中国图象图形学报,2006,11(8):1188-1192. 被引量：38
9周志华.Multi-Instance Learning from Supervised View[J].Journal of Computer Science & Technology,2006,21(5):800-809. 被引量：12
10程伟,石扬,张燕平.粒度计算的三种主要方法[J].计算机技术与发展,2007,17(3):91-94. 被引量：7

共引文献1168

1孙朝云,裴莉莉,徐磊,李伟,杜耀辉.基于DS-LOF与GA-XGBoost的路域环境感知数据智能检测与修复[J].中国公路学报,2023,36(4):15-26. 被引量：2
2康琛笠,刘西青.基于PSO-SVM的矿用干式变压器局部放电模式识别[J].计算机产品与流通,2020,9(8):138-138. 被引量：1
3罗益超,李争彦,张奇.基于句子选择的关键短语生成[J].中文信息学报,2021,35(8):64-72.
4蒋月,Shaker ul Din,刘勇,张寅丹,刘巨峰,陆海霞.一种集成多分类器的面向地理对象遥感影像变化回溯分析方法[J].兰州大学学报（自然科学版）,2020(5):666-676. 被引量：1
5丁晓欣,刘凯.基于软间隔支持向量机的装配式建筑构件质量控制研究[J].建筑经济,2020(S02):62-67. 被引量：5
6王云锋,刘丹,裴作飞,姚丽霜.基于改进引力搜索算法的SVM的参数优化及应用[J].计算机应用研究,2020,37(S01):152-154. 被引量：7
7郝昕毓,周建涛,王昊.表格单元格分类的端到端不完全监督方法[J].计算机与数字工程,2023,51(1):59-65.
8宋闯,赵佳佳,王康,梁欣凯.面向智能感知的小样本学习研究综述[J].航空学报,2020(S01):15-28. 被引量：16
9严雨灵,陈闵叶,吕亚辉.基于Leap Motion的三维动态手势识别研究[J].智能计算机与应用,2020,0(1):271-273.
10程平,何昱衡,辜榕容.基于支持向量机机器学习算法的项目人员绩效评价研究――基于A风景园林规划研究院规划设计类项目[J].中国管理会计,2020(1):32-43. 被引量：2

1刘天航,杨晓雪,周慧,赵中英.基于图神经网络的协同过滤推荐算法综述[J].集成技术,2024,13(4):1-15.
2刘昕雨,张琳,姜高霞,王文剑.标记相关性修正的多标记众包标签推断方法[J].小型微型计算机系统,2024,45(5):1025-1031.
3张珊丹,翁伟,谢小竹,魏博文,王劲波,文娟.基于全局和局部关系的类属特征多标记分类算法[J].山东大学学报（理学版）,2024,59(5):23-34.
4董海,宋宇菲.基于自适应GDSA-BPNN的选区激光熔化质量预测[J].制造技术与机床,2023(8):19-26.
5程雨轩,毛煜,张小清,曾艺祥,林耀进.基于次相关特征和邻域互信息的在线多标记特征选择算法[J].山东大学学报（理学版）,2024,59(5):70-81.
6丁世飞,张子晨,郭丽丽,张健,徐晓.孪生支持向量回归机研究进展[J].电子学报,2023,51(4):1117-1134. 被引量：1
7冯筠,邓佳慧,周末,陈宝莹.多模态医学图像配准算法综述[J].华中科技大学学报（自然科学版）,2024,52(5):29-49.
8孙林,马天娇.基于中心偏移的Fisher score与直觉邻域模糊熵的多标记特征选择[J].计算机科学,2024,51(7):96-107.
9汤鑫.基于SVM算法的文本分类[J].中国科技期刊数据库科研,2016(10):138-138.
10滕晓燕,何萍.中医骨科智能导诊管理平台设计与应用[J].中国数字医学,2024,19(7):23-28.

计算机学报

2024年第5期

浏览历史

内容加载中请稍等...

弱监督场景下的支持向量机算法综述

参考文献11

二级参考文献73

共引文献1168

相关作者

相关机构

相关主题

浏览历史