期刊文献+

Combining Committee-Based Semi-Supervised Learning and Active Learning 被引量:6

Combining Committee-Based Semi-Supervised Learning and Active Learning
原文传递
导出
摘要 Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones. Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第4期681-698,共18页 计算机科学技术学报(英文版)
基金 partially supported by the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation(DFG) supported by a scholarship of the German Academic Exchange Service(DAAD)
关键词 data mining classification active learning CO-TRAINING semi-supervised learning ensemble learning randomsubspace method decision tree nearest neighbor classifier data mining, classification, active learning, co-training, semi-supervised learning, ensemble learning, randomsubspace method, decision tree, nearest neighbor classifier
  • 相关文献

参考文献34

  • 1Zhou Z H, Chen K J, Jiang Y. Exploiting unlabeled data in content-based image retrieval. In Proc. the 15th European Conf. Machine Learning ( ECML 2004), Pisa, Italy, Sept. 20- 24, 2004, pp.525-536.
  • 2Li M, Zhou Z H. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Systems, Man and Cybernetics - Part A: Systems and Humans, 2007, 37(6): 1088-1098.
  • 3Levin A, Viola P, Freund Y. Unsupervised improvement of visual detectors using Co-Training. In Proc. the Int. Conf. Computer Vision, Graz, Austria, April 1-3, 2003, pp.626-633.
  • 4Nigam K, McCallum A K, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2/3): 103-134.
  • 5Kiritchenko S, Matwin S. Email classification with Co- Training. In Proc. the 2001 Conf. the Centre for Advanced Studies on Collaborative Research ( CASCON 2001), Toronto, Canada, Nov. 5-7, 2001, pp.8-19.
  • 6Nigam K, Ghani R. Analyzing the effectiveness and applicability of Co-Training. In Proc. the 9th Int. Conf. Information and Knowledge Management, McLean, USA, Nov. 6-11, 2000, pp.86-93.
  • 7Lewis D D, Gale A W. A sequential algorithm for training text classifiers. In Proc. the Special Interest Group on Info. Retrieval, Dublin, Irland, July 3-6, 1994, pp.3-12.
  • 8Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological, 1977, 39(1): 1-38.
  • 9Blum A, Mitchell T. Combining labeled and unlabeled data with Co-Training. In Proc. the 11th Annual Conf. Computational Learning Theory (COLT1998), Madison, USA, July 24-26, 1998, pp.92-100.
  • 10Muslea I, Minton S, Knoblock C A. Selective sampling with redundant views. In Proc. the 17th National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, USA, Jul. 30- Aug. 3, 2000, pp.621-626.

同被引文献54

  • 1邓超,郭茂祖.基于自适应数据剪辑策略的Tri-training算法[J].计算机学报,2007,30(8):1213-1226. 被引量:15
  • 2Zhu Xiaojin. Semi-supervised Learning Literature Survey[R]. Department of Computer Sciences, University of Wisconsin at Madison, Tech. Rep: 1530, 2008.
  • 3Zhou Zhihua, Zhan Dechuan, Yang Qiang. Semi-supervised Learning with Very Few Labeled Training Examples[C]// Proceedings of the 22nd AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 2007: 675-680.
  • 4Seeger M. Learning with Labeled and Unlabeled Data[R]. Institute for Adaptive and Neural Computation, University of Edinburgh, Tech. Rep.: EPFL-REPORT-161327, 2002.
  • 5Lewis D, Gale W. A Sequential Algorithm for Training Text Classifiers[C]//Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval. Dublin, Ireland: ACM Press, 1994: 3-12.
  • 6Seuong H, Opper M, Sompolinski H. Query by Committee[C]// Proceedings of the 5th ACM Workshop on Computational Learning Theory. Pittsburgh, USA: ACM Press, 1992: 287-294.
  • 7Freund Y, Seung H S, Shamir E, et al. Selective Sampling Using the Query by Committee Algorithm[J]. Machine Learning, 1997, 28(2/3): 133-168.
  • 8McCallum A K, Nigram K. Employing EM and Pool-based Active Learning for Text Classification[C]//Proceedings of the 15th International Conference on Machine Learning. Madison, USA: [s. n.], 1998: 350-358.
  • 9Muslea I, Minton S, Knoblock C A. Active+Semi-supervised Learning=Robust Multi-view Learning[C]//Proceedings of the 19th International Conference on Machine Learning. Sydney, Australia: [s. n.], 2002: 435-442.
  • 10Muslea I, Minton S, Knoblock C A. Selective Sampling with Redundant Views[C]//Proceedings of the 17th International Conference on Machine Learning. Stanford, USA: [s. n.], 2000: 621-626.

引证文献6

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部