Combining Committee-Based Semi-Supervised Learning and Active Learning 被引量：6

Combining Committee-Based Semi-Supervised Learning and Active Learning

导出

摘要 Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features （views） and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee （CoBC） is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones. Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features （views） and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee （CoBC） is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.

作者 Mohamed Farouk Abdel Hady Friedhelm Schwenker

机构地区 Institute of Neural Information Processing

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第4期681-698,共18页 计算机科学技术学报（英文版）

基金 partially supported by the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation(DFG) supported by a scholarship of the German Academic Exchange Service(DAAD)

关键词 data mining classification active learning CO-TRAINING semi-supervised learning ensemble learning randomsubspace method decision tree nearest neighbor classifier data mining, classification, active learning, co-training, semi-supervised learning, ensemble learning, randomsubspace method, decision tree, nearest neighbor classifier

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献34

1Zhou Z H, Chen K J, Jiang Y. Exploiting unlabeled data in content-based image retrieval. In Proc. the 15th European Conf. Machine Learning ( ECML 2004), Pisa, Italy, Sept. 20- 24, 2004, pp.525-536.
2Li M, Zhou Z H. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Systems, Man and Cybernetics - Part A: Systems and Humans, 2007, 37(6): 1088-1098.
3Levin A, Viola P, Freund Y. Unsupervised improvement of visual detectors using Co-Training. In Proc. the Int. Conf. Computer Vision, Graz, Austria, April 1-3, 2003, pp.626-633.
4Nigam K, McCallum A K, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2/3): 103-134.
5Kiritchenko S, Matwin S. Email classification with Co- Training. In Proc. the 2001 Conf. the Centre for Advanced Studies on Collaborative Research ( CASCON 2001), Toronto, Canada, Nov. 5-7, 2001, pp.8-19.
6Nigam K, Ghani R. Analyzing the effectiveness and applicability of Co-Training. In Proc. the 9th Int. Conf. Information and Knowledge Management, McLean, USA, Nov. 6-11, 2000, pp.86-93.
7Lewis D D, Gale A W. A sequential algorithm for training text classifiers. In Proc. the Special Interest Group on Info. Retrieval, Dublin, Irland, July 3-6, 1994, pp.3-12.
8Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological, 1977, 39(1): 1-38.
9Blum A, Mitchell T. Combining labeled and unlabeled data with Co-Training. In Proc. the 11th Annual Conf. Computational Learning Theory (COLT1998), Madison, USA, July 24-26, 1998, pp.92-100.
10Muslea I, Minton S, Knoblock C A. Selective sampling with redundant views. In Proc. the 17th National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, USA, Jul. 30- Aug. 3, 2000, pp.621-626.

同被引文献54

1邓超,郭茂祖.基于自适应数据剪辑策略的Tri-training算法[J].计算机学报,2007,30(8):1213-1226. 被引量：15
2Zhu Xiaojin. Semi-supervised Learning Literature Survey[R]. Department of Computer Sciences, University of Wisconsin at Madison, Tech. Rep: 1530, 2008.
3Zhou Zhihua, Zhan Dechuan, Yang Qiang. Semi-supervised Learning with Very Few Labeled Training Examples[C]// Proceedings of the 22nd AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 2007: 675-680.
4Seeger M. Learning with Labeled and Unlabeled Data[R]. Institute for Adaptive and Neural Computation, University of Edinburgh, Tech. Rep.: EPFL-REPORT-161327, 2002.
5Lewis D, Gale W. A Sequential Algorithm for Training Text Classifiers[C]//Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval. Dublin, Ireland: ACM Press, 1994: 3-12.
6Seuong H, Opper M, Sompolinski H. Query by Committee[C]// Proceedings of the 5th ACM Workshop on Computational Learning Theory. Pittsburgh, USA: ACM Press, 1992: 287-294.
7Freund Y, Seung H S, Shamir E, et al. Selective Sampling Using the Query by Committee Algorithm[J]. Machine Learning, 1997, 28(2/3): 133-168.
8McCallum A K, Nigram K. Employing EM and Pool-based Active Learning for Text Classification[C]//Proceedings of the 15th International Conference on Machine Learning. Madison, USA: [s. n.], 1998: 350-358.
9Muslea I, Minton S, Knoblock C A. Active+Semi-supervised Learning=Robust Multi-view Learning[C]//Proceedings of the 19th International Conference on Machine Learning. Sydney, Australia: [s. n.], 2002: 435-442.
10Muslea I, Minton S, Knoblock C A. Selective Sampling with Redundant Views[C]//Proceedings of the 17th International Conference on Machine Learning. Stanford, USA: [s. n.], 2000: 621-626.

引证文献6

1张雁,吴保国,吕丹桔,林英.基于Tri-training的主动学习算法[J].计算机工程,2014,40(6):215-218. 被引量：3
2郭虎升,王文剑.基于主动学习的模式类别挖掘模型[J].计算机研究与发展,2014,51(10):2148-2159. 被引量：4
3郭金玲,樊东燕,郭虎升.一种动态的主动多分类方法[J].数据采集与处理,2016,31(1):152-159.
4贺婉莹,杨建林.基于随机游走模型的排序学习方法[J].数据分析与知识发现,2017,1(12):41-48. 被引量：2
5张东方,陈海燕,王建东.半监督特征选择综述[J].计算机应用研究,2021,38(2):321-329. 被引量：6
6Imrus Salehin,Md.Shamiul Islam,Pritom Saha,S.M.Noman,Azra Tuni,Md.Mehedi Hasan,Md.Abu Baten.AutoML: A systematic review on automated machine learning with neural architecture search[J].Journal of Information and Intelligence,2024,2(1):52-81.

二级引证文献15

1蔡柳,恵飞,叶敏,康科,赵祥模.基于不确定抽样的半监督城市土地功能分类方法[J].吉林大学学报（信息科学版）,2016,34(4):550-555. 被引量：1
2刘锦文,许静,张利萍,芮伟康.基于标签传播和主动学习的人物社会关系抽取[J].计算机工程,2017,34(2):234-240. 被引量：4
3胡小娟,刘磊,邱宁佳.基于主动学习和否定选择的垃圾邮件分类算法[J].电子学报,2018,46(1):203-209. 被引量：16
4薛芯菊.基于Python的K-means算法及其应用[J].科技视界,2018(24):141-142. 被引量：8
5陈珊,戴俊谭.基于哨兵节点的微博信息传播范围监测研究[J].传播力研究,2019,3(33):262-264.
6张明西,乐水波,李学民,董一鹏.文本配图系统的设计与实现[J].包装工程,2020,41(19):252-258. 被引量：1
7康璐璐,范兴容,王茜竹,杨晓雅,明蕊.基于特征组分层与半监督学习的鼠标轨迹识别[J].计算机工程,2021,47(4):277-284.
8刘占峰,潘甦.粒子群优化的模糊粗糙集双约简算法[J].北京邮电大学学报,2021,44(4):49-55. 被引量：3
9李方,吴国栋,涂立静,刘玉良,查志康,李景霞.图自编码器推荐研究综述[J].计算机工程与科学,2022,44(2):335-344. 被引量：3
10马骏,杨镜宇,吴曦.基于预聚类主动半监督的作战体系效能评估[J].系统工程与电子技术,2022,44(6):1889-1896.

1郭霞,张琦,贾爱梅.基于神经网络与专家系统的电路故障诊断研究[J].电子工程师,2005,31(8):71-73. 被引量：1
2Ji-Ying Zhong Xu Lei De-Zhong Yao.Semi-Supervised Learning Based on Manifold in BCI[J].Journal of Electronic Science and Technology of China,2009,7(1):22-26. 被引量：1
3陶建林,李楠.一种改进的基于方向图的指纹细化算法[J].商情,2013(6):272-272.
4刘宁,赵建华,冯骜骜.基于主动学习的有监督在线多核学习算法[J].河南科学,2016,34(9):1423-1427. 被引量：2
5LENG Biao,QIN Zheng,LI Li-qun.Support Vector Machine active learning for 3D model retrieval[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2007,8(12):1953-1961. 被引量：6
6Chenchen SUN,Derong SHEN,Yue KOU,Tiezheng NIE,Ge YU.A genetic algorithm based entity resolution approach with active learning[J].Frontiers of Computer Science,2017,11(1):147-159. 被引量：1
7李国正,刘天羽.Feature selection for co-training[J].Journal of Shanghai University(English Edition),2008,12(1):47-51. 被引量：2
8Mustafa Zaidi,Imran Amin,Ahmad Hussain,Nukman Yusoff.Error assessment of laser cutting predictions by semi-supervised learning[J].Journal of Central South University,2014,21(10):3736-3745.
9Wang-Kun Chen,Ping Wang.A Framework of Active Learning by Concept Mapping[J].US-China Education Review(A),2012,2(11):946-952. 被引量：1
10终于来了，the new ipad[J].微型计算机,2012(23):54-55.

Journal of Computer Science & Technology

2010年第4期

浏览历史

内容加载中请稍等...

Combining Committee-Based Semi-Supervised Learning and Active Learning 被引量：6

参考文献34

同被引文献54

引证文献6

二级引证文献15

相关作者

相关机构

相关主题

浏览历史