基于样本不同属性综合的鲁棒偏倚赖主动学习分类算法研究

Study on sample different attributes-based robust and partial dependent active learning for classification

下载PDF

导出

摘要主动学习算法可以有效减少样本标注的工作量,每次选取最有信息量的样本交由专家标注。样本的代表性与不确定性都是衡量样本信息量的重要因素,将两者综合考虑能够获得更好的综合效果,但在两者的结合方式上一直存在不少问题,导致算法的适应性不强。为解决该问题,本文提出了基于样本不同属性的鲁棒偏倚赖主动学习分类算法,通过引入偏倚赖权值系数函数,在综合考虑样本的代表性和不确定性的同时,更可以突出样本的特性。同时由于样本代表性模型的渐变,在选择样本过程中更能突出代表性样本与不确定性样本的学习层次,前期训练以代表性样本为主,后期训练以不确定性样本为主,使得算法的适应性大大提高。在UCI机器学习数据库上的仿真实验结果表明本文的思路是合理可行的,在实验所用数据集上,与所提供的对比算法相比,本文的方法只需较少的标注样本便可以达到相同的分类正确率。 Active learning algorithm can alleviate effectively the efforts of labeling instances by selecting the most informative examples for experts to label in each training step.Representative and uncertainty of data selection are significant factors for searching information of samples,while the existed algorithm having some problems on the way for combining the two factors,so there will be better result if adaptive considered the two factors in training proceeding.In order to solve this problem,an algorithm of different sample attributes-based robust and partial dependent active learning for classification is proposed in this paper.The algorithm emphasizes a certain characteristic of data by introducing a coefficient-weighted function which generally considered representative and uncertainty of data meanwhile,and the algorithm is robust while giving prominence to the learning levels of representative and uncertain samples,thus it gives priority to representative data in the early stage and uncertain data in the later stage on account of the gradual changing model of classification.The simulation experimental results show that this method is valid and efficient,and it selects fewer instances than relative methods on used UCI datasets when obtaining the same classification accuracy.

作者任大伟胡正平高文涛

机构地区燕山大学信息科学与工程学院

出处《燕山大学学报》 CAS 2011年第1期74-80,共7页 Journal of Yanshan University

基金河北省自然科学基金资助项目(F2008000891F2010001297) 中国博士后自然科学基金资助项目(20080440124) 第二批中国博士后基金特别资助项目(200902356)

关键词主动学习偏倚赖样本代表性样本不确定性分类 active learning partial dependency representative of data selection uncertainty of data selection classification

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1胡正平,高文涛.基于改进加权压缩近邻与最近边界规则SVM训练样本约减选择算法[J].燕山大学学报,2010,34(5):421-425. 被引量：6
2Dasgupta S, Kalai A T, Monteleoni C. Analysis of perceptron- based active learning [J]. Journal of Machine Learning Research, 2009,10 (1): 281-299.
3龙军,殷建平,祝恩,蔡志平.选取最大可能预测错误样例的主动学习算法[J].计算机研究与发展,2008,45(3):472-478. 被引量：16
4Freund Y, Seung H S, Shamir E, et al.. Selective sampling using the query by committee algorithm [J]. Machine Learning, 1997, 28 (2/3): 133-168.
5Cohn D A, Ghahramani Z, Jordan M I. Active learning withstatistical models [J]. Journal of Artificial Intelligence research, 1996,4 (1): 129-145.
6John Paisey, Liao Xuejun, Carin Lawrence. Active learning and basis selection for kernel-based linear models: a bayesian perspec- tive [J]. IEEE Transactions on Signal Processing, 2010,58 (5): 2686-2700.
7胡正平,高文涛,万春艳.基于样本不确定性和代表性相结合的可控主动学习算法研究[J].燕山大学学报,2009,33(4):341-346. 被引量：4
8Geng Bo, Yang Linjun, Zha Zhengjun, et al.. Unbiased active learning for image retrieval [C] //Proceedings oflEEE International Conference on Multimedia and Expo, Hannover, 2008:1325-1328.
9Xu Z, Yu K, Tresp V, et al.. Representative sampling for text clas- sification using support vector machines [C] //25th European Con- ference on Information Retrieval Research, Pisa, Italy, 2003: 393-407.
10Nguyen H T, Smeulders A. Active learning using pre-clustering [C] //Proceedings of the twenty-first International Conference on Machine Learning, Banff, Alberta, Canada, 2004: 79-86.

二级参考文献20

1姜文瀚,周晓飞,杨静宇.基于样本选择的最近邻凸包分类器[J].中国图象图形学报,2008,13(1):109-113. 被引量：4
2M Seeger, Learning with labeled and unlabeled data [R]. Edinburgh University, Tech Rep, 2001.
3D D Lewis, W A Gale. A sequential algorithm for training text classifiers [C]. In: Proc of the 17th ACM Int'l Conf on Research and Development in Information Retrieval. Berlin: Springer, 1994.
4H S Seung, M Opper, H Sompolinsky. Query by committee [C]. The 5th Workshop on Computational Learning Theory, San Mateo, CA, 1992.
5H T Nguyen, A Smeulders. Active learning using pre-clustering [C]. The 21th Int'l Conf on Machine Learning, Banff, CA, 2004.
6S Tong, D Koller. Support vector machine active learning with applications to text classification [J]. Journal of Machine Learning Research, 2001, 2:45-66.
7G Schohn, D Cohn. Leas is more: Active learning with support vector machines [C]. In: Proc of the 17th Int'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2000.
8C Campbell, N Cristianini, A Smola. Query learning with large margin classifiers [C]. In: Proc of the 17th lnt'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2000.
9D A Cohn, Z Ghahramani, M I Jordan. Active learning with statistical models [J ]. Journal of Artificial Intelligence research, 1996, 4:129-145.
10N Roy, A McCallum. Toward optimal active learning through sampling estimation of error [C]. The 18th Int'l Conf on Machine Learning, San Francisco, CA, 2001.

共引文献23

1胡正平,高文涛,万春艳.基于样本不确定性和代表性相结合的可控主动学习算法研究[J].燕山大学学报,2009,33(4):341-346. 被引量：4
2徐海龙,王晓丹,廖勇,权文.一种基于主动学习的SVM增量训练算法[J].控制与决策,2010,25(2):282-286. 被引量：4
3华漫.基于语义的体育视频场景分割方法[J].计算机工程,2010,36(15):206-207. 被引量：2
4胡正平,高文涛.基于改进加权压缩近邻与最近边界规则SVM训练样本约减选择算法[J].燕山大学学报,2010,34(5):421-425. 被引量：6
5贾俊芳.基于层次聚类的主动学习方法——HC_AL[J].计算机应用,2011,31(8):2134-2137. 被引量：2
6吴伟宁,刘扬,郭茂祖,刘晓燕.基于采样策略的主动学习算法研究进展[J].计算机研究与发展,2012,49(6):1162-1173. 被引量：33
7龙珑,邓伟.绿色网络博文倾向性分析算法研究[J].计算机应用研究,2013,30(4):1095-1098. 被引量：1
8龙珑,邓伟.绿色网络网页正文内容提取算法[J].计算机工程,2013,39(7):252-256. 被引量：1
9刘杨磊.基于MPWPS主动学习的半监督协同分类算法[J].山西经济管理干部学院学报,2013,21(3):98-100.
10覃希,苏一丹,张雯.商空间框架下的大规模SVM数据集约减法[J].计算机科学,2013,40(12):104-107.

1吴强,李建平,胡小荣,刘雄伟.基于灰色关联分析的模糊支持向量机中隶属度的确定[J].数学理论与应用,2007,27(2):108-111. 被引量：2
2江学军,唐焕文.前馈神经网络泛化性能力的系统分析[J].系统工程理论与实践,2000,20(8):36-40. 被引量：44
3张翔,肖小玲,徐光祐.模糊支持向量机中隶属度的确定与分析[J].中国图象图形学报,2006,11(8):1188-1192. 被引量：38
4曹永锋,陈荣,孙洪.基于BvSBHC的主动学习多类分类算法[J].计算机科学,2013,40(8):309-312. 被引量：3
5付正立.分层教学在《误差与数据处理》课中的应用[J].海峡科学,2012(10):92-93.
6顾志伟,吴秀清.基于包层多示例主动学习的图像检索[J].中国科学技术大学学报,2009,39(11):1146-1151. 被引量：1
7刘三民,孙知信,刘涛.基于样本不确定性的增量式数据流分类研究[J].小型微型计算机系统,2015,36(2):193-196. 被引量：9
8高志华,贲可荣.基于主动学习和自学习的噪声源识别方法[J].计算机工程与应用,2015,51(1):115-118.
9胡正平,高文涛,万春艳.基于样本不确定性和代表性相结合的可控主动学习算法研究[J].燕山大学学报,2009,33(4):341-346. 被引量：4
10王新建,罗光春,秦科,陈爱国,赖云一.一种基于SVM和主动学习的图像检索方法[J].计算机应用研究,2016,33(12):3836-3838. 被引量：6

燕山大学学报

2011年第1期

浏览历史

内容加载中请稍等...

基于样本不同属性综合的鲁棒偏倚赖主动学习分类算法研究

参考文献11

二级参考文献20

共引文献23

相关作者

相关机构

相关主题

浏览历史