期刊文献+

用双层减样法优化大规模SVM垃圾标签检测模型 被引量:5

Double-layer reduction method optimizes large scale SVM social spam detection model
下载PDF
导出
摘要 针对支持向量机在训练大规模数据集时出现的速度瓶颈问题,提出一种新的减样方法,称为双层减样法。数据减样时,双层减样法从粗、细粒度两个层次削减样本。粗粒度约减时,利用核空间距离聚类法,以簇为单位削减冗余子集;细粒度约减时,以点为单位挑选剩余点集中的支持向量。实验表明,双层减样法能有效地压缩样本数据,同时还能放大数据集的分类特征,提高分类器的分类精度。将此法应用于大规模SVM垃圾标签检测模型的训练集优化上,能明显提高检测模型的训练速度。双层减样法将粒度和层次的概念引入减样法中,在约减时适时改变约减幅度,这比传统减样法更具有优势。 In order to improve the low efficiency of large-scale SVM,this paper presented a new samples reduction method,called double-layer reduction method.It reduced data in two levels.The first level was coarse-grained reduction.It deleted the redundant clusters with KDC reduction.The second level was fine-grained reduction.It picked out the support vectors from the clusters remained by SMO.The experiments show that double-layer reduction method gives a higher compression ratio and accuracy.It applied the new method to the large scale SVM social spam detection model.The detection model speeds up obviously.Unlike the traditional reduction method,double-layer reduction method uses the concept of "Granularity" and "Level" into reducing method.It changes the reduced intensity according to the number of redundant points remained which has more advantage in reducing.
作者 覃希 苏一丹
出处 《计算机应用研究》 CSCD 北大核心 2011年第6期2095-2098,共4页 Application Research of Computers
基金 广西工学院自然科学基金资助项目(院科自1074011)
关键词 FOLKSONOMY 垃圾标签 支持向量机 双层减样法 约减 Folksonomy social spam SVM double-layer reduction method reduce
  • 相关文献

参考文献12

  • 1KIM C J, HWANG K B. Naive Bayes classier learning with feature selection for spam detection, in social bookmarking [ C ]//Lecture Notes in Computer Science. Berlin: Springer-Verlag, 2008.
  • 2刘万里,刘三阳,杜喆.SVM中基于距离的减样方法[J].数据采集与处理,2008,23(3):333-337. 被引量:3
  • 3陈光喜,徐健,成彦.一种聚簇消减大规模数据的支持向量分类算法[J].计算机科学,2009,36(3):184-188. 被引量:10
  • 4LIU Xiao-zhang, FENG Guo-can. Kernel bisecting K-means cluste- ring for SVM training sample reduction[ C]//Proc of the 19th Interna- tional Conference on Pattern Recognition. 2008:1-4.
  • 5XU Yan-zi, QIN Hua. A new optimazation method of large-scale SVMs based on kernel distance clustering[ C]//Proc of International Computational Intelligence and Software Engineering. 2009:1-4.
  • 6HOTHO A, JASCHKE R, SCHMITZ C, et al. Emergent semantics in bibSonomy [ M ]. Liskowsky : GI Jahrestagung, 2006 : 305- 312.
  • 7MADKOUR A, HEFNI T, HEFN Y A, et al. Using semantic features to detect spamming in social bookmarking systems [ C ]//Lecture Notes in Computer Science. Berlin: Springer-Verlag , 2008.
  • 8VAPNIK V N. Estimation of dependences based on empirical data [ M]. 2nd ed. New York: Springer-Verlag, 1982.
  • 9OSUNA E, FREUND R, GIROSI F. Improved training algorithm for support vector machines[ C]//Proc of the IEEE Conference on Neural Networks for Signal Processing. Amelia Island: IEEE, 1997:276- 285.
  • 10ZANNI L, SERAFINI T, ZANGHIRATI G. Parallel software for training large scale support vector machines on muhiprocessor systems [ J]. Journal of Machine Learning Research, 2006,7: 1467- 1492.

二级参考文献31

  • 1李红莲,王春花,袁保宗,朱占辉.针对大规模训练集的支持向量机的学习策略[J].计算机学报,2004,27(5):715-719. 被引量:53
  • 2胡懋智,古红英.各种不同类型的支持向量机及其性能比较分析[J].计算机工程与应用,2005,41(12):37-40. 被引量:8
  • 3白亮,老松杨,胡艳丽.支持向量机训练算法比较研究[J].计算机工程与应用,2005,41(17):79-81. 被引量:15
  • 4Zheng Chun-Hong,Jiao Li-Cheng. Fuzzy Pre-extracting Method For Support Vector Machine[A]//Proceedings of the First International Conference on Machine Learning and Cybernetics. Beijing, November 2002 : 4-5
  • 5Mangasarian O L, Musicant D R. Successive overrelaxation for support vector machines[J]. IEEE Tangasarian on Neural Networks, 1999,10: 1032-1037
  • 6Vapnik V N. Statistical Learning Theory[M]. New York: Wiley, 1998
  • 7BROADLY. Social spam definition [ EB/OL ]. (2008- 7- 21 ). http ://www. bryanehen.com/2008/07/21/soeial-spam/.
  • 8KIM C J, HWANG K B. Naive Bayes classier learning with feature selection for spam detection in social bookmarking [ C ]//Lecture Notes in Computer Science. Berlin : Springer-Verlag,2008.
  • 9GRAMME P, CHEVALIER J F. Rank for spam detection[ C]//Lecture Notes in Computer Science. Berlin: Springer-Verlag,2008.
  • 10MADKOUR A, HEFNI T, HEFNY A, et al. Using semantic features to detect spamming in social bookmarking systems [ C ]//Lecture Notes in Computer Science. Berlin : Springer-Verlag,2008.

共引文献19

同被引文献51

  • 1李道国,苗夺谦,张红云.粒度计算的理论、模型与方法[J].复旦学报(自然科学版),2004,43(5):837-841. 被引量:41
  • 2奉国和,朱思铭.基于聚类的大样本支持向量机研究[J].计算机科学,2006,33(4):145-147. 被引量:14
  • 3王华忠,俞金寿.核函数方法及其模型选择[J].江南大学学报(自然科学版),2006,5(4):500-504. 被引量:40
  • 4程伟,石扬,张燕平.粒度计算的三种主要方法[J].计算机技术与发展,2007,17(3):91-94. 被引量:7
  • 5邓乃阳,田英杰.数据挖掘中的新方法-支持向量机[M].北京:科学出版社,2004.
  • 6BROADLY. Social spam definition [ EB/OL ]. (2008- 07- 21 ) [ 2011 - 03- 01 ]. http ://www. Bryanchen. com/2008/07/21/socialspanr/.
  • 7KIM C,HWANG K B. Naive Bayes classier learning with feature se- lection for spam detection in social bookmarking [ C ]//Proc of the 19th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer-Ver- lag, 2008.
  • 8MADKOUR A, HEFNI T, HEFNY A,et al. Using semantic features to detect spamming in social bookmarking systems[ C ]//Proc of the 19th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer-Ver- lag. 2008.
  • 9LIU Xiao-zhang,FENG Guo-can. Kernel bisecting K-means clustering for SVM training sample reduction [ C ]//Proc of the 19th International Conference on Pattern Recognition. [ S. 1. ] : IEEE,2008.
  • 10XU Yan-zi, QIN Hua. A new optimization method of large-scale SVM based on kernel distance clustering[ C ]//Proc of International Confer- ence on Computational Intelligence and Software Engineering. [ S. 1. ] : IEEE,2009.

引证文献5

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部