期刊文献+

基于Base64编码的垃圾图片过滤方法 被引量:7

Image Spam Filtering Method Based on Base64 Encoding
下载PDF
导出
摘要 针对目前直接提取图片文本特征费时且分类准确率不高,以及使用图像属性特征过滤垃圾图片召回率低下等问题,提出一种快速有效的垃圾图片过滤方法。在使用4-gram切分Base64编码后的图片文本后,通过Binary特征将图片特征项表示为Binary向量,并训练支出向量机分类器来识别垃圾图片。实验结果表明,该方法不仅能够识别不同格式的垃圾图片,而且垃圾图片识别精确率、召回率和F1值分别可达99.85%、99.49%和99.67%。 Extracting embedded text from images to filter image spam is usually time-consuming and can not reach high classification accuracy.On the other hand,filtering image spam using image properties features has low recall rates problem.This paper proposes a simple but effective method to detect image spam.By tokenizing Base64-encoded image text into a series of 4-gram features and representing them as a binary vector,a trained Support Vector Machine(SVM) can distinguish spam images from legitimate ones very well.Experimental results show that the method achieves satisfactory performance in filtering image spam with different formats,with the precision,recall and F1 of 99.85%,99.49% and 99.67% respectively.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第8期194-196,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60970081) 国家"863"计划基金资助项目(2007AA01Z197)
关键词 垃圾图片 BASE64编码 4-gram分词 支持向量机 image spam; Base64 encoding; 4-gram; Support Vector Machine(SVM);
  • 相关文献

参考文献7

  • 1Fumera G, Pillai 1, Roli F. Spare Filtering Based on the Analysis of Text Information Embedded Into Images[J]. Journal of Machine Learning Research, 2006, (7): 2699-2720.
  • 2Dredze M, Gevaryahu R. Elias-Bachrach A. Learning Fast Classifiers for Image Spam[C]//Proc. of the 4th Conference on Email and Anti-Spam. Philadelphia, USA: [s. n.], 2007.
  • 3He Peizhou, Wen Xiangming, Zheng Wei. A Simple Method for Filtering Image Spam[C]//Proc. of the 8th IEEE/ACIS International Conference on Computer and Information Science. Washington D. C., USA: IEEE Computer Society, 2009: 910-913.
  • 4Biggio B, Fumera G, Pillai I, et al. Image Spare Filtering Using Visual lnformation[C]//Proc, of the 14th International Conference on Image Analysis and Processing. Washington D. C., USA: IEEE Computer Society, 2007:105-110.
  • 5Zuo Haiqiang, Li Xi, Wu Ou, et al. Image Spam Filtering Using Fourier-Mellin Invariant Features[C]//Proc. of 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Computer Society, 2009: 849-852.
  • 6万明成,耿技,程红蓉,王勇.基于颜色与角点特征的图像垃圾邮件识别算法[J].计算机工程,2009,35(15):209-211. 被引量:5
  • 7Fan Rongen, Chang Kaiwei, Hsieh C J, et al. LIBL1NEAR: A Library for Large Linear Classification[J]. Journal of Machine Learning Research, 2008, (9): 1871-1874.

二级参考文献8

  • 1Furmcra G,Pillai I,Roli F.Spare Filtering Based on the Analysis of Text Information Embedded into Images[J].Journal of Machine Learning Research,2006,(7):2699-2720.
  • 2Wu C T,Cheng K T,Zhu Q,et al.Using Visual Features for Anti-spam Filtering[C]//proc.of ICIP'05.Genoa,Italy:IEEE Press,2005.
  • 3Nhung N P,Phuong T M.An Efficient Method for Filtering Image-based Spam[C]//Proc.of IEEE International Conference on Research,Innovation and Vision for the Future.Hanoi,Viemam:IEEE Press,2007.
  • 4Hu Jianying,Bagga A.Categorizing Images in Web Documents[J].IEEE Multimedia,2004,11(1):22-30.
  • 5Wan Mingcheng,Zhang Fengli,Cheng Hhongrong,et al.Text Localization in Spare Image Using Edge Features[C]//Proc.of International Conference on Communications,Circuits and Systems.Xiamen,China:[s.n.],2008.
  • 6Aradhye H B,Myers G K.Herson J A.Image Analysis for Efficient Categorization of Image-based Spare E-mail[C]//Proc.of the 8th International Conference on Document Analysis and Recognition.Washington D.C.,USA:IEEE Computer Society,2005.
  • 7Byun B,Lee C H,Webb B S,et al.A Discriminative Classifier Learning Approach to Image Modeling and Spam Image Identification[C]//Proc.of CEAS'07.California,USA:[s.n.],2007.
  • 8董建设,袁占亭,张秋余.代价敏感支持向量机在垃圾邮件过滤中的应用[J].计算机工程,2008,34(10):131-132. 被引量:4

共引文献4

同被引文献65

引证文献7

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部