期刊文献+

基于增量学习和主动学习的垃圾邮件识别新方法

Novel Incremental Learning and Active Learning Based Spam Identification Method
下载PDF
导出
摘要 垃圾邮件识别是计算机取证领域的重要研究内容。多数垃圾邮件识别方法未能有效地考虑用户兴趣邮件识别结果的影响。提出了一种基于增量学习和主动学习的垃圾邮件识别新方法。为获得最有效特征,在特征选择阶段综合考虑了单词信息和非单词信息;接着,为减少待标注样本选择时间,提出了一种基于投影的不确定样本选择方法;最后,在样本标注过程中,提出了自动推荐样本类别及用户兴趣度的样本标注新方法。多种对比实验表明,算法针对垃圾邮件识别精度高,待标注样本选择速度较快,用户标注负担较小,具有较高的应用价值。 Spam identification is an important research content in computer forensics field. Most spam identification methods do not consider the effect of users~ interests on the identification results effectly. In this paper,a novel incre- mental learning and active learning based spam identification method was proposed. Firstly, for achieving the best fea- tures, the term information and non-term information was cosidered synthetically in the feature selection process. Sec- ondly,a projection based uncertain sample selection method was proposed for reducing the time of recommending samples to users for labeling. Finally, in the sample labeling process, a novel sample labeling method which can recom- mend the sample label and the user interest degree automatically was proposed. Many comparative experiments show that, the proposed method has high spam identification precision, quick speed of selecting the samples for labeling and low burden of sample labeling, proving the high value of the proposed method on practical application.
出处 《计算机科学》 CSCD 北大核心 2015年第B10期23-27,共5页 Computer Science
基金 本文受信息保障技术重点实验室开放基金项目(KJ-14-008)资助.
关键词 垃圾邮件识别 计算机取证 增量学习 主动学习 样本标注 用户兴趣度 Spam identification, Computer forensics, Incremental learning, Active learning, Sample labeling, User in- terest degree
  • 相关文献

参考文献14

  • 1Luckner M,Gad M,Sobkowiak P.Stable Web spam detection using features based on lexical items[J].Computers Securi-ty,2014,46:79-93.
  • 2翟军昌,秦玉平,车伟伟.垃圾邮件过滤中信息增益的改进研究[J].计算机科学,2014,41(6):214-216. 被引量:8
  • 3Bouchachia A,Gabrys B,Sahel Z.Overview of some incremental learning algorithms[C]//IEEE International Conference on Fuzzy Systems.2007:1-6.
  • 4Liu W Y,Wang T.Active learning for online spam filtering[M]//Information Retrieval Technology:4th Asia Information Retrieval Symposium,AIRS 2008.2008:555-560.
  • 5Syed N,Liu H,Sung K.Incremental learning with support vector machines[C]// Proceedings of the Workshop on Support Vector Machines at the International Joint Conference on Artificial Intelligence(IJCAI-99).Stockholm,Sweden,1999.
  • 6Amayri O,Bouguila N.A study of spam filtering using support vector machines[J].Artificial Intelligence Review,2010,34(1):73-108.
  • 7Joshi A J,Porikli F,Papanikolopoulos N.Multi-class active learning for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Miami,USA:IEEE,2009;2372-2379.
  • 8陈荣,曹永锋,孙洪.基于主动学习和半监督学习的多类图像分类[J].自动化学报,2011,37(8):954-962. 被引量:74
  • 9王友卫,刘元宁,凤丽洲,朱晓冬.基于用户兴趣度的垃圾邮件在线识别新方法[J].华南理工大学学报(自然科学版),2014,42(7):21-27. 被引量:4
  • 10Wu Y,Kozintsev I,Bouguet J Y,et al.Sampling strategies for active learning in personal photo retrieval[C]//Proceedings of ICME 2006.Piscataway,NJ:IEEE,2006:529-532.

二级参考文献67

  • 1王学军,赵琳琳,王爽.基于主动学习的视频对象提取方法[J].吉林大学学报(工学版),2013,43(S1):51-54. 被引量:3
  • 2李洋,方滨兴,王申.基于用户反馈的反垃圾邮件技术[J].计算机工程,2007,33(8):130-132. 被引量:9
  • 3夏桂梅,曾建潮.一种基于轮盘赌选择遗传算法的随机微粒群算法[J].计算机工程与科学,2007,29(6):51-54. 被引量:28
  • 4中国反垃圾邮件联盟[EB/OL].http://www.anti-spam.org.cn.
  • 5Shrestha Raju, LIN Yaping, CHEN Zhiping. Bayesian Spam Filtering Based on Co-Weighting Multi-Estimations [ C ]//Progress in Intelli- gence Computation & Applications ,2009:500-505.
  • 6Jiansheng Wu, Tao Deng. Research in Anti-Spam Method Based on Bayesian Filtering[ C ]//2008 IEEE Pacific-Asia Workshop on Compu- tational Intelligence and Industrial Application,2008:887- 891.
  • 7Chun-Chao Yeh, Soun-Jan Chiang. Revisit Bayesian approaches for Spam Detection [ C ]//The 9th International Conference for Young Computer Scientists,2008:659 - 664.
  • 8Biju Issac, Wendy Japutra Jap, Jofry Hadi Sutanto. Improved Bayesian Anti-Spare Filter - Implementation and Analysis on Independent Spam Corpuses[ C ]//2009 International Conference on Computer En- gineering and Technology,2009:326 - 330.
  • 9Fdez-Riverola F,Iglesias E L, F Dl'az,et al. Applying lazy learning algorithms to tackle concept drift in spam filtering[ J]. Expert Systems with Applications,2007:36 - 48.
  • 10Yang Yiming. An evaluation of statistical approaches to text categoriza- tion[J]. Information Retrieval ,1999,1 ( 1 -2) :69 -90.

共引文献84

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部