摘要
多示例学习(MIL)的任务是训练一个有效的分类器,以处理具有复杂数据结构的包。一个包对应一个样本,由多个实例构成,描述了样本的信息特征。基于标准MIL假设,如果包中至少有一个正实例,则该包为正,反之为负。已有的多示例学习算法通常将包看作一个整体或基于整个实例空间进行学习。然而,数据集中通常包含噪声,将对分类结果造成一定的影响。本文提出半监督多示例分类的两层粒化与空间转换方法(TSSM)。首先,在单包粒度层次上,设计基于密度与距离的去噪技术,获得特征值更为突出的包。其次,在数据集粒度层次上,设计关键包选择技术,获得更具全局代表性的关键包。最后,利用基于关键包的空间转换技术,获得新的数据嵌入,以构建更加精确的分类器。实验结果表明TSSM比大多数MIL分类算法有更高的性能。
The task of multi-instance learning(MIL) is to train an effective classifier to deal with bags which have complex data structure. One bag corresponds to one sample, which is composed of multiple instances and describes the information characteristics of the sample. Based on standard MIL assumptions, the bag is positive if it contains at least one positive instance, otherwise it is negative. Most of existing MIL algorithms process the learning procedure with the whole bag or the instance space. However, the datasets usually contain noise, which will have a certain impact on the classification result. In this paper, we propose the two-level granulation and spatial transformation method for semi-supervised multi-instance classification(TSSM). Firstly, at the bag-granulation level, the denoising technique obtains the bags with more prominent feature values based on the density and distance metrics. Secondly, at the dataset-granulation level, the key bag selection technique acquires key bags with global representation. Finally, the spatial transformation technique acquires a new data embedding according to key bags and trains a more accurate classifier. Experimental results show that TSSM has higher performance than most MIL classification algorithms.
作者
杨梅
唐文韬
王轩
闵帆
YANG Mei;TANG Wen-tao;WANG Xuan;MIN Fan(School of Computer Science,Southwest Petroleum University,Chengdu 610500,China;Network and Information Center,Southwest Petroleum University,Chengdu 610500,China;Institute for Artificial Intelligence,Southwest Petroleum University,Chengdu 610500,China)
出处
《模糊系统与数学》
北大核心
2022年第1期110-119,共10页
Fuzzy Systems and Mathematics
基金
国家自然科学基金资助项目(62006200)
四川省自然科学基金资助项目(2019YJ0314)
四川省青年科学技术创新团队项目(2019JDTD0017)
西南石油大学研究生全英文课程建设项目(2020QY04)。
关键词
多示例学习
半监督学习
粒化
去噪
空间转换
Multi-instance Learning
Semi-supervised Learning
Granulation
Denoising
Spatial Transformation