期刊文献+

一种新的支持向量机主动学习策略 被引量:10

A novel support vector machine active learning strategy
下载PDF
导出
摘要 本文提出一种新的支持向量机(support vector machine,SVM)主动学习策略,称为Dix_SVMactive.通过定义新的数据置信度度量来挑选最有价值样本进行人工标注,并在每次迭代中对训练集的平衡度进行调整,以获得更好的泛化能力.在UCI标准数据集上的测试结果表明,与基于随机选样的SVMactive和传统SVMactive(Tong SVMactive)方法相比,本文算法不仅可以提高分类精度,而且能减少人工标注的工作量. This paper proposes a new strategy of active learning for support vector machine (SVM), which is called Dix-SVMactive. Generally, the shorter the distance between the sample and the hyperplane, the more uncertainty and more information the sample contains, and thus it is of more value. Active learning is an iterative process, so the convergence speed should also be considered. In this paper, by defining a new confidence measure parameter about samples, the most valuable samples will be selected to be marked artificially. The confidence of a given unlabeled sample, which can be regarded as the value of the sample, is defined as the ratio of the mean value of the distance between the presented sample and the labeled samples to the distance between the presented sample and the hyperplane. While, the mean value of the distance between the presented sample and the labeled samples can measure the redundancy rate of the given sample to labeled samples, and the distance between the presented sample and the hyperplane can express the uncertainty of the sample. In general, the bigger the former and the smaller the latter, the bigger is the confidence of the sample. Additionally, the set of labeled sample obtained after each loop may be unbalanced, which means the hyperplane may be a little far away from one kind of samples and more close from another kind of samples. In this situation, according to the proposed approach to select samples, the number of samples close to the hyperplane will be more than that far from the hyperplane, and this may be lead to bad generalization performance. To avoid the unbalance of dataset, after each loop the proposed algorithm will test the balance degree of the dataset, which is the ratio of the number of minority samples to that of majority ones. When the ratio is not greater than a given threshold e, the dataset will be regarded as unbalanced. At this time, some samples belonging to the majority samples will be deleted by some strategy like clustering to make numbers of two classes samples be equal. During each iterative step, the balance degree of the selected dataset will be adjusted so as to obtain good generalization ability. Summarily, the confidence of each sample is computed firstly, and then the first a few samples will be added into the training dataset according to the confidence in descend sort. At last, the balance of the training dataset in each loop will be adjusted. The experiment results on University of California Irvine benchmatk datasets demonstrate that the proposed approach can not only improve the classification precision, but also reduce the workload of marking samples artificially compared to some common used approaches, e. g. , the SVMactive, which is based on the random sample, and the Tong SVMactive approach.
出处 《南京大学学报(自然科学版)》 CSCD 北大核心 2012年第2期182-189,共8页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(60975035) 教育部新世纪人才支持计划项目(NCET-07-0525) 教育部博士点基金(20091401110003) 山西省自然科学基金(2009011017-2) 山西省研究生创新项目(20103021)
关键词 支持向量机 主动学习 置信度 support vector machine, active learning, confidence
  • 相关文献

参考文献16

  • 1Simon H A,Lea G. Problem solving and rule education:A unified view knowledge and organ-ization[J].Erbuam,1974,(02):63-73.
  • 2韩光,赵春霞,胡雪蕾.一种新的SVM主动学习算法及其在障碍物检测中的应用[J].计算机研究与发展,2009,46(11):1934-1941. 被引量:14
  • 3Dagan I,Engelson S. Committee-based sampling for training probabilistic classifiers[A].Tahoe City:Morgan Kavfmann,1995.150-157.
  • 4Lewis W,Gale A. A sequential algorithm for training text classifiers (uncertainty sampling)[A].Lodon:Springer-Verlag,1994.3-12.
  • 5Tong S,Koller D. Support vector machine ac- tive learning with applications to text Classifica- tion[J].Journal of Machine Learning Research,2001.45-66.
  • 6Schohn G,Cohn D. Less is more: Active learn- ing with support vector machines[A].San Francisco:Morgan Kaufmann Publishers,2000.45-66.
  • 7Seung H S,Opper M,Sompolinsky H. Query by committee[A].University of Clifornia:Association for Computing Machinery,1992.287-294.
  • 8Freund Y,Seung H S,Samir E. Selective sampling using the query by committee algo- rithm[J].Machine Learning,1997,(23):133-168.
  • 9Vladimir N V. The nature of statistical learning theory[M].New York:springer-verlag,2000.1-334.
  • 10Vapnik V. Statictical learning theory[M].New York:wiley,1998.11-23.

二级参考文献26

  • 1凌俊斌,庄卫华,刘鲁西.图像检索中的主动学习及其可测量性[J].计算机技术与发展,2006,16(2):132-134. 被引量:3
  • 2田春娜,高新波,李洁.基于嵌入式Bootstrap的主动学习示例选择方法[J].计算机研究与发展,2006,43(10):1706-1712. 被引量:8
  • 3Lee W, Stolfo S J, Mok K W. A data mining framework for building intrusion detection models. Proceedings of the 1999 IEEE Symposium on Security and Privacy. Oakland: IEEE Computer Society, 1999, 120-132.
  • 4Almgren M, Jonsson E. Using active learning in intrusion detection. Proceedings of the 17^th IEEE Symposium on Security Foundations Workshop. IEEE Computer Society, 2004, 88-98.
  • 5Lee W, Fan W, Miller M, et al. Toward costsensitive modeling for intrusion detection and response. Journal of Computer Security, 2002, 10(1/2) : 5-22.
  • 6Fan W, Lee W, Stolfo S J, et al. A multiple model cost-sensitive approach for intrusion detection. Proceedings of the 11^th European Conference on Machine Learning. Berlin: Springer- Verlag, 2000, 1810:3-14.
  • 7Margineantu D D. Active cost-sensitive learning. http://www. ijcai. org/papers/post-0525. pdf. 2005.
  • 8Nguyen H T, Smeulders A. Active learning using pre-clustering. Proceedings of the 21^th International Conf on Machine Learning. San Diego: ACM Press, 2004, 79-86.
  • 9Muslea I, Minton S, Knoblock C A. Active learning with multiple views. Journal of Artificial Intelligence Research, 2006, 27 : 203-233.
  • 10Lewis D D, Gale W A. A sequential algorithm for training text classifiers. Proceedings of the 17^th ACM International Conference on Research and Development in Information Retrieval. Berlin: Springer, 1994.

共引文献19

同被引文献145

  • 1龙军,殷建平,祝恩,赵文涛.主动学习研究综述[J].计算机研究与发展,2008,45(z1):300-304. 被引量:31
  • 2赵英刚,陈奇,何钦铭.一种基于支持向量机的直推式学习算法[J].江南大学学报(自然科学版),2006,5(4):441-444. 被引量:8
  • 3韩冰,高新波,姬红兵.一种基于选择性集成SVM的新闻音频自动分类方法[J].模式识别与人工智能,2006,19(5):634-639. 被引量:5
  • 4赵悦,穆志纯.基于QBC的主动学习研究及其应用[J].计算机工程,2006,32(24):23-25. 被引量:5
  • 5Tikhonov A A, Arsenin V Y. Solutions of ill posed problems. New York Wiley, 1977.
  • 6Cortes C, Vapnik V. Support vector networks. Machines Learning, 1995,20(3) : 273 -297.
  • 7Vapnik V. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 1999,10 (5) 988-999.
  • 8Scholkopf t3, Smola A J. [.earning with kernels. Cambridge, MA : MIT Press, 2002.
  • 9Bayro Corrochano E J, Arana-Daniel N. Clifford support vector machines for classification, regression,and recurrence. IEEE Transaction on Neural Networks,2010,21(11) :1731-1746.
  • 10Yang J B, Ong C J. Feature selection using probabilistic prediction of support vector Regression. IEEE Transactions on Neural Networks, 2011,22(6) : 954-962.

引证文献10

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部