摘要
提出了一种考虑包中样本在概念空间中重要度的多示例核学习方法。引入了包中示例对整个样本空间所包含概念的权重向量。通过数量化的手段表示出每个示例从属于每个概念的重要程度。主要步骤如下:a)通过对所有示例进行聚类,得到能够反映多示例包中所含概念的簇;b)借用文本分类中的r-pattern计算得到每个示例对于概念空间中每个概念的权重向量;c)在多示例核中通过余弦相似度结合示例的权重,得到更能反映概念空间特性的多示例概念核。该方法同时考虑了包层次的概念和示例层次的权重,能够有效度量包中示例对于最终包标记的影响,且本身建立在多示例核的基础上,适用于多种多示例学习的场合。在标准数据集和图像数据集上的实验表明,该算法是有效的。
A Multi-Instance (MI) kernel learning algorithm with respect to instance weights derived from con-cept space by means of clustering is proposed, as well as concept weights defined in the whole instances space, which provides a potential way to quantify the importance of an individual instance to a certain concept. The main step of the propose algorithm is: a) generate concept by clustering in the instances space, b) Adopting a r-Pattern like procedure, get instances weights to every concept in step a). c) Compute the weighted MI kernel matrix using cosine similarity of instances weights vector. The proposed algorithm takes both bag-level concepts and instance-lev-el weights into account, which can effectively measure importance of different potential concepts in the instance space. Moreover, the algorithm is directly based on MI kernel and a rigorous proof of its consistency with the fa-mous MI metadata assumption are given. Experiments on benchmark dataset Muskl and Musk2 are conducted, as well as famous real life image dataset to demonstrate its effectiveness.
出处
《科学技术与工程》
北大核心
2012年第30期7931-7936,共6页
Science Technology and Engineering
基金
国家自然科学基金项目(61033010
U0935002)
国家科技计划项目(2008ZX10005-013)资助
关键词
多示例学习
多示例概念
示例权重
r-Pattern
多示例核
multi-instance learning ,multi-instance concept, instance weight, r-pattern, multi-in-stance kernel