期刊文献+

基于样本不同属性综合的鲁棒偏倚赖主动学习分类算法研究

Study on sample different attributes-based robust and partial dependent active learning for classification
下载PDF
导出
摘要 主动学习算法可以有效减少样本标注的工作量,每次选取最有信息量的样本交由专家标注。样本的代表性与不确定性都是衡量样本信息量的重要因素,将两者综合考虑能够获得更好的综合效果,但在两者的结合方式上一直存在不少问题,导致算法的适应性不强。为解决该问题,本文提出了基于样本不同属性的鲁棒偏倚赖主动学习分类算法,通过引入偏倚赖权值系数函数,在综合考虑样本的代表性和不确定性的同时,更可以突出样本的特性。同时由于样本代表性模型的渐变,在选择样本过程中更能突出代表性样本与不确定性样本的学习层次,前期训练以代表性样本为主,后期训练以不确定性样本为主,使得算法的适应性大大提高。在UCI机器学习数据库上的仿真实验结果表明本文的思路是合理可行的,在实验所用数据集上,与所提供的对比算法相比,本文的方法只需较少的标注样本便可以达到相同的分类正确率。 Active learning algorithm can alleviate effectively the efforts of labeling instances by selecting the most informative examples for experts to label in each training step.Representative and uncertainty of data selection are significant factors for searching information of samples,while the existed algorithm having some problems on the way for combining the two factors,so there will be better result if adaptive considered the two factors in training proceeding.In order to solve this problem,an algorithm of different sample attributes-based robust and partial dependent active learning for classification is proposed in this paper.The algorithm emphasizes a certain characteristic of data by introducing a coefficient-weighted function which generally considered representative and uncertainty of data meanwhile,and the algorithm is robust while giving prominence to the learning levels of representative and uncertain samples,thus it gives priority to representative data in the early stage and uncertain data in the later stage on account of the gradual changing model of classification.The simulation experimental results show that this method is valid and efficient,and it selects fewer instances than relative methods on used UCI datasets when obtaining the same classification accuracy.
出处 《燕山大学学报》 CAS 2011年第1期74-80,共7页 Journal of Yanshan University
基金 河北省自然科学基金资助项目(F2008000891F2010001297) 中国博士后自然科学基金资助项目(20080440124) 第二批中国博士后基金特别资助项目(200902356)
关键词 主动学习 偏倚赖 样本代表性 样本不确定性 分类 active learning partial dependency representative of data selection uncertainty of data selection classification
  • 相关文献

参考文献11

  • 1胡正平,高文涛.基于改进加权压缩近邻与最近边界规则SVM训练样本约减选择算法[J].燕山大学学报,2010,34(5):421-425. 被引量:6
  • 2Dasgupta S, Kalai A T, Monteleoni C. Analysis of perceptron- based active learning [J]. Journal of Machine Learning Research, 2009,10 (1): 281-299.
  • 3龙军,殷建平,祝恩,蔡志平.选取最大可能预测错误样例的主动学习算法[J].计算机研究与发展,2008,45(3):472-478. 被引量:16
  • 4Freund Y, Seung H S, Shamir E, et al.. Selective sampling using the query by committee algorithm [J]. Machine Learning, 1997, 28 (2/3): 133-168.
  • 5Cohn D A, Ghahramani Z, Jordan M I. Active learning withstatistical models [J]. Journal of Artificial Intelligence research, 1996,4 (1): 129-145.
  • 6John Paisey, Liao Xuejun, Carin Lawrence. Active learning and basis selection for kernel-based linear models: a bayesian perspec- tive [J]. IEEE Transactions on Signal Processing, 2010,58 (5): 2686-2700.
  • 7胡正平,高文涛,万春艳.基于样本不确定性和代表性相结合的可控主动学习算法研究[J].燕山大学学报,2009,33(4):341-346. 被引量:4
  • 8Geng Bo, Yang Linjun, Zha Zhengjun, et al.. Unbiased active learning for image retrieval [C] //Proceedings oflEEE International Conference on Multimedia and Expo, Hannover, 2008:1325-1328.
  • 9Xu Z, Yu K, Tresp V, et al.. Representative sampling for text clas- sification using support vector machines [C] //25th European Con- ference on Information Retrieval Research, Pisa, Italy, 2003: 393-407.
  • 10Nguyen H T, Smeulders A. Active learning using pre-clustering [C] //Proceedings of the twenty-first International Conference on Machine Learning, Banff, Alberta, Canada, 2004: 79-86.

二级参考文献20

  • 1姜文瀚,周晓飞,杨静宇.基于样本选择的最近邻凸包分类器[J].中国图象图形学报,2008,13(1):109-113. 被引量:4
  • 2M Seeger, Learning with labeled and unlabeled data [R]. Edinburgh University, Tech Rep, 2001.
  • 3D D Lewis, W A Gale. A sequential algorithm for training text classifiers [C]. In: Proc of the 17th ACM Int'l Conf on Research and Development in Information Retrieval. Berlin: Springer, 1994.
  • 4H S Seung, M Opper, H Sompolinsky. Query by committee [C]. The 5th Workshop on Computational Learning Theory, San Mateo, CA, 1992.
  • 5H T Nguyen, A Smeulders. Active learning using pre-clustering [C]. The 21th Int'l Conf on Machine Learning, Banff, CA, 2004.
  • 6S Tong, D Koller. Support vector machine active learning with applications to text classification [J]. Journal of Machine Learning Research, 2001, 2:45-66.
  • 7G Schohn, D Cohn. Leas is more: Active learning with support vector machines [C]. In: Proc of the 17th Int'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2000.
  • 8C Campbell, N Cristianini, A Smola. Query learning with large margin classifiers [C]. In: Proc of the 17th lnt'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2000.
  • 9D A Cohn, Z Ghahramani, M I Jordan. Active learning with statistical models [J ]. Journal of Artificial Intelligence research, 1996, 4:129-145.
  • 10N Roy, A McCallum. Toward optimal active learning through sampling estimation of error [C]. The 18th Int'l Conf on Machine Learning, San Francisco, CA, 2001.

共引文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部