多示例学习的自适应密度分布挖掘与三视角嵌入集成

Adaptive Density Distribution Mining and Tri-perspective Embedding Ensemble for Multi-instance Learning

导出

摘要多示例学习(Multi-Instance Learning,MIL)的处理对象是包含若干示例的包,包有标签而示例通常没有标签.MIL的主要任务是学习已有包的特征信息以训练分类器.基于嵌入的MIL方法的主要策略是选择代表样本,将包嵌入到新的特征空间.然而,现有的大多数算法通常难以适应多样的数据分布,且单视角的嵌入可能导致向量在新特征空间中的特征值较弱.本文提出了多示例学习的自适应密度分布挖掘与三视角嵌入集成算法,包含3个关键技术:(1)自适应密度分布示例选择技术用于挖掘负示例空间的密度分布特征,将密度较大且相连的核心示例聚类成任意形状的簇,从而获得负代表示例集合;再根据正负示例间相似性最小化原则获得正代表示例集合.(2)三视角嵌入技术用于挖掘包的正、负和整体特征信息,并将包转为三个视角下的单向量.(3)集成技术分别基于三个视角下的向量训练3个单示例分类器,并通过硬投票集成这些分类器,从而获得最终MIL模型.在实验中,我们使用了来自4个领域的30个数据集,并与7个前沿MIL算法进行对比.结果表明ADTE算法在数据集上的平均准确性高于其它对比算法,尤其在文本分类和网页推荐数据集上取得了较好的效果. The main object of Multi-Instance Learning(MIL)is a bag containing several instances,with the bag being labeled while the instances are usually unlabeled.The primary task of MIL is to grasp the distinctive feature information of these bags for classifier training.The main strategy of the embedding-based MIL method is to select representative instances and embed the bags into a new feature space.However,most existing algorithms are struggle with adapting diverse data distributions.Relying on single-perspective embedding may lead to vectors with weak eigenvalues in new feature spaces.In this paper,we propose the ADTE algorithm,which consists of three key techniques.(1)Adaptive density distribution instance selection technique is used to mine the density distribution characteristics of the negative instance space,clustering core instances with higher and connected densities into clusters of arbitrary shapes,thereby obtaining a set of negative representative instances.The set of positive representative instances is obtained based on the principle of minimizing similarity between positive and negative instances.(2)The tri-perspective embedding technique is employed to mine the positive,negative,and overall feature information of the bags and convert the bags into unidimensional vectors under three perspectives.(3)The ensemble technique trains three single-instance classifiers based on the vectors from the three perspectives respectively.These classifiers are then integrated through hard voting to obtain the final MIL model.In the experiments,we used 30 datasets from four domains and compared them with seven state-of-the-art MIL algorithms.The results show that the ADTE algorithm has a higher average accuracy on the datasets compared to other algorithms,particularly achieving better results in text classification and web recommendation datasets.

作者陈天霖杨梅闵帆方宇 CHEN Tianlin;YANG Mei;MIN Fan;FANG Yu(School of Computer Science,Southwest Petroleum University,Chengdu 610500,China;Institute for Artificial Intelligence,Southwest Petroleum University,Chengdu 610500,China;Lab of Machine Learning,Southwest Petroleum University,Chengdu 610500,China)

机构地区西南石油大学计算机科学学院西南石油大学人工智能研究院西南石油大学机器学习研究中心

出处《昆明理工大学学报（自然科学版）》北大核心 2023年第6期54-65,共12页 Journal of Kunming University of Science and Technology(Natural Science)

基金国家自然科学基金项目(62006200) 中央引导地方科技发展专项项目(2021ZYD0003) 四川省自然科学基金项目(2019YJ0314) 浙江省海洋大数据挖掘与应用重点实验室开放课题(OBDMA202102).

关键词自适应密度聚类示例选择多示例学习三视角嵌入 adaptive density clustering instance selection multi-instance learning tri-perspective embedding

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1杨梅,唐文韬,王轩,闵帆.半监督多示例分类的两层粒化与空间转换方法[J].模糊系统与数学,2022,36(1):110-119. 被引量：1

二级参考文献1

1薛晓冰,韩洁凌,姜远,周志华.基于多示例学习技术的Web目录页面链接推荐[J].计算机研究与发展,2007,44(3):406-411. 被引量：6

1唐生昊,贺敬安,王军,盛海彦,胡馨月,许文奕.基于改进聚类分析与网格搜索优化的雷电定位算法研究[J].现代雷达,2023,45(8):48-57. 被引量：1
2张亚林,李晓松.改进AOA结合贝塞尔曲线平滑的机器人路径规划[J].计算机工程与设计,2023,44(10):3170-3178. 被引量：4
3张云菲,刘佳,徐鹏,唐耿标,佟昆.基于众源轨迹数据的道路交叉口层次化提取方法[J].测绘地理信息,2023,48(5):123-129. 被引量：2
4白海洋,林俊宪,陈家合,张柳,周璇滢.基于YOLOv5算法的水位智能监测系统[J].计算机科学与应用,2023,13(6):1244-1256.
5卢爱金.房建工程项目墙体砌筑施工探讨[J].中文科技期刊数据库（文摘版）工程技术,2023(12):148-151.
6张敖,关媛,张学才,杨双,计舒文,刘翼宁,阮燕晔,郑洪建.玉米全基因组选择育种研究进展[J].上海农业学报,2023,39(5):13-18. 被引量：2
7李湛.数字音频嵌入技术在广播电视工程中的应用研究[J].电声技术,2023,47(7):37-39.
8黄旭,高欣,李保丰,翟峰,秦煜,梁晓兵.基于鉴别性粒度自适应设定和衰退掩码的智能电表可视故障分类方法[J].电网技术,2023,47(11):4755-4764. 被引量：1
9凌勇.石油裂解油品中二烯烃的高效液相色谱分析方法[J].化学工程师,2023,37(11):28-30. 被引量：1
10我国奶牛育种芯片自主攻关实现重要突破[J].今日畜牧兽医（奶牛）,2023(10):26-27.

昆明理工大学学报（自然科学版）

2023年第6期

浏览历史

内容加载中请稍等...

多示例学习的自适应密度分布挖掘与三视角嵌入集成

参考文献1

二级参考文献1

相关作者

相关机构

相关主题

浏览历史