通过评估示例中概念的重要性来解决多示例学习问题

Solving Multi-instance Learning Problem with Evaluating the Importance of Concept in Instances

下载PDF

导出

摘要在多示例学习问题中,训练数据集里面的每一个带标记的样本都是由多个示例组成的包,其最终目的是利用这一数据集去训练一个分类器,使得可以利用该分类器去预测还没有被标记的包。在以往的关于多示例学习问题的研究中,有的是通过修改现有的单示例学习算法来迎合多示例的需要,有的则是通过提出新的方法来挖掘示例与包之间的关系并利用挖掘的结果来解决问题。以改变包的表现形式为出发点,提出了一个解决多示例学习问题的算法——概念评估算法。该算法首先利用聚类算法将所有示例聚成d簇,每一个簇可以看作是包含在示例中的概念;然后利用原本用于文本检索的TF-IDF(Term Frequency-Inverse Document Frequency)算法来评估出每一个概念在每个包中的重要性;最后将包表示成一个d维向量——概念评估向量,其第i个位置表示第i个簇所代表的概念在某个包中的重要程度。经重新表示后,原有的多示例数据集已不再是"多示例",以至于一些现有的单示例学习算法能够用来高效地解决多示例学习问题。 In multi-instance learning, the training set is composed of labeled bags, each of which consists of many unla beled instances, and the goal is to learn some classifier from the training set for correctly labeling unseen bags. In the past, some researches about multi-instance learning aim at improving single-instance learning algorithms to meet the multi-instance representation,and others try to propose some new methods to find the relationship between instances and bags and use the result to solve the problem. This paper started from adapting the representation of the bag and proposed a new algorithm--concept evaluating algorithm. First, this algorithm uses a cluster algorithm to cluster all instances into d group, here each group can be treated as a concept in the instances. Then, it uses the TF-IDF （term fre- quency-inverse document frequency）algorithm to get the importance of each concept in the bag. Finally, each bag is re- represented as a d dimensional vector concept evaluating vector, the ith value in this vector is the importance of the ith group in the bag. Because after re-representing the data set is not ＂multi＂ again, some propositional single-instance learning algorithms can be used to solve multi-instance learning problem effetely.

作者甘睿印鉴

机构地区中山大学信息科学与技术学院

出处《计算机科学》 CSCD 北大核心 2012年第7期144-147,共4页 Computer Science

关键词多示例学习重新表示单示例学习概念评估 Multi-instance learning, Re-represent, Single-instance learning, Concept evaluating

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献9

1Dietterich T G, Lathrop R H, Lozano-Prez T. Solving the mul- tiple-instance problem with axis-parallel rectangles[J]. Artificial Intelligence, 1997,89 (1/2) 31-71.
2Maron O, Lozano-Prez T. A framework for multiple-instance leaming[M. Neural Information Processing Systems 10, Cam- bridge, MA: MIT Press, 1998 : 570-576.
3Zhang Qi, Goldman S A. [M]-DD: An Improved Multiple-in- stance Learning Technique[M]. Neural Information Processing Systems, 2001.
4Wang Jun,Jean-Daniel Z. Solving Multiple-instance Problem: A Lazy Learning Approach[C]//17th International Conference on Machine Learning. 2000 : 1119-1125.
5Zhang Y, Jin R, Zhou Z-H. Understanding bag-of-words model: A statistical framework [J]. International Journal of Machine Learning and Cybernetlcs, 2010,1 ( 1 ) : 43-52.
6Wikipedia. tf-idf[EB/OL], http..//en, wikipedi org/wiki/Tf E2%8o%93idf.
7Platt J. Machines Using Sequential Minimal Optimization[M],// Schoelkopf B, Burges C, Smola A, eds. Kernel Methods-Support Vector Learning. 1998.
8Andrews S, Tsochantaridis I, Hofmann T. Support Vector Ma- chines for Multiple-instance Learningl-M. Neural Information Processing Systems 15,2003 : 561-568.
9Freund Y, Schapire R E. Experiments with a new boosting algo- rithm[C] /,/ Thirteenth International Conference on Machine Learning. San Francisco, 1995:148-156.

1路金泉,徐开勇,戴乐育.基于文本过滤的贝叶斯分类算法的改进[J].计算机与现代化,2016(9):100-103. 被引量：3
2刘志明,刘鲁.基于机器学习的中文微博情感分类实证研究[J].计算机工程与应用,2012,48(1):1-4. 被引量：124
3刘露,彭涛,左万利,戴耀康.一种基于聚类的PU主动文本分类方法[J].软件学报,2013,24(11):2571-2583. 被引量：24
4殷君伟,陈建明,薛百里,张健.一种基于排序划分的聚类初始化方法[J].微电子学与计算机,2013,30(6):80-83. 被引量：3
5徐红波,胡文,潘海为,高祥,刘润涛.高维空间范围查询并行算法研究[J].哈尔滨商业大学学报（自然科学版）,2013,29(1):73-75. 被引量：2
6甘睿,印鉴.通过挖掘示例中的概念来解决多示例学习问题[J].计算机研究与发展,2011,48(S3):73-78. 被引量：3
7陈朔鹰,金镇晟.基于改进的TF-IDF算法的微博话题检测[J].科技导报,2016,34(2):282-286. 被引量：15
8苗国义,穆瑞辉.云计算环境下虚拟机在线迁移策略研究[J].计算机测量与控制,2013,21(8):2227-2229. 被引量：3
9孙玉强,巢碧霞.基于双重并行计算模型的TFIDF算法[J].计算机工程与设计,2016,37(11):3016-3021. 被引量：2
10李盛瑜,何文.一种对聊天文本进行特征选取的方法研究[J].计算机科学,2007,34(5):202-204.

计算机科学

2012年第7期

浏览历史

内容加载中请稍等...

通过评估示例中概念的重要性来解决多示例学习问题

参考文献9

相关作者

相关机构

相关主题

浏览历史