摘要
多示例学习是一种新型的机器学习框架,正包中大量的噪声使多示例数据集具有很大的歧义性.为了排除多示例数据集正包中大量的假正例,提高分类精度,结合邻域覆盖算法,提出一个新的多示例包层次覆盖k近邻算法.覆盖算法的学习结果是一系列的球形邻域,在每一个球形邻域中只含有同类样本,本文利用的覆盖算法的这一特性重新组织多示例数据集的包结构.概括的说,为了排除正包中大量的假正例,首先对原有的多示例包结构进行重新构造,使用覆盖算法生成的球形邻域做为新的包结构,从而提高多示例样本在新的特征空间中的可分离性.然后,使用包层次的k近邻算法排除正包中大量的噪声并预测测试包的类别.实验表明,多示例学习的包层次覆盖k近邻算法具有很好的性能.
Multi-instance learning is a new framework in machine learning. An extensive number of noises in the positive bags is the inherent difficulty of multi-instance learning. In order to improve the classification accuracy, this paper puts forward a novel multi-in- stance learning bag-level Covering-kNN algorithm to exclude the noises in multi-instance data set. The learning results of Covering al- gorithm is a set of sphere neighbors and each sphere neighbor only contains patterns belong to the same class. This feature help us re- organize the structure of bags in multi-instance data set. Generally speaking, in order to exclude false positive instances in the positive bags, first, we reconstruct the structure of multi-instance data set by treating the sphere neighbors obtained using Covering algorithm as the new structure of bags. Thus, improving the separable of multi-instance samples in the new feature space. Then, the bag-level kNN algorithm is utilized to exclude the noises in positive bags and predict the labels of test bags. The experiments demonstrate the effectiveness of the proposed multi-instance bag-level Covering-kNN algorithm.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第11期2511-2514,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61073117
61175046)资助
安徽省教育厅基金项目(kJ2013A016)资助
安徽大学研究生学术创新项目(10117700183)资助
关键词
机器学习
多示例学习
覆盖算法
K近邻算法
machine learning
multi-instance learning
covering algorithm
kNN algorithm