摘要
以往半监督多示例学习算法常把未标记包分解为示例集合,使用传统的半监督单示例学习算法确定这些示例的潜在标记以对它们进行利用。但该类方法认为多示例样本的分类与其概率密度分布紧密相关,且并未考虑包结构对包分类标记的影响。提出一种基于包层次的半监督多示例核学习方法,直接利用未标记包进行半监督学习器的训练。首先通过对示例空间聚类把包转换为概念向量表示形式,然后计算概念向量之间的海明距离,在此基础上计算描述包光滑性的图拉普拉斯矩阵,进而计算包层次的半监督核,最后在多示例学习标准数据集和图像数据集上测试本算法。测试表明本算法有明显的改进效果。
In previous semi-supervised multi-instance learning, unlabeled bags are often decomposed as set of instances, and then normal single-instance semi-supervised learning algorithms are adopted to make use of such unlabeled data samples. However, these algorithms only take instance-level density distribution into consideration, and have little to do with structure of individual bags. We propose a bag-level semi-supervised multi-instance kernel learning algorithm, which directly makes use of unlabeled bags in learning procedure. A representation transformation is applied to generate concept-vector representation of bags. The proposed algorithm is tested on both multi-instance learning benchmark data set Musk1 / Musk2, and Corel Image 2000 data set. The evaluation results indicate the effectiveness of the proposed algorithm.
出处
《自动化与信息工程》
2013年第5期1-6,共6页
Automation & Information Engineering
基金
广东省科技项目(2011B04020000
2012A010701013)
广州市科技项目(11A31090341
11A53010726
2011Y5-00004)
关键词
多示例学习
半监督学习
多示例核
包光滑性
图拉普拉斯
核映射
Multi-Instance Learning
Semi-Supervised Learning
Multi-Instance Kernel
Bag-Level Smoothness
Graph Laplacian
Kernel Mapping