期刊文献+

垃圾邮件的改进贝叶斯过滤算法 被引量:1

Improved Bayesian filtering algorithm for spam
下载PDF
导出
摘要 在研究贝叶斯过滤算法原理和实现方法的基础上,将垃圾邮件的先验概率由常数改进为实际概率,改进了token的选取范围和选取规则,在检测内容上增加url和图片。最后设计了一个基于改进贝叶斯过滤算法的垃圾邮件过滤器。实验结果表明,这种改进的贝叶斯过滤算法在垃圾邮件过滤中有良好的应用效果。 Based on the theory and practice of Bayesian filtering algorithms, a detailed process for improving the algorithm is put forward. Firstly, instead of a constant probability of spam, the actual prior probability is used. Secondly, the selective scope and rules of token are improved. Finally, url and images are added to the detected content. A spare filter based on the improved Bayesian filtering algorithm has been designed. Test results show that the improved Bayesian filtering algorithm works well in practice.
出处 《北京化工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2008年第6期93-97,共5页 Journal of Beijing University of Chemical Technology(Natural Science Edition)
基金 '十一五'国家科技支撑计划(2006BAK31B04)
关键词 垃圾邮件 改进贝叶斯过滤 内容检测 spare email improved Bayesian filtering content detection
  • 相关文献

参考文献12

  • 1Massey B, Thomure M, Budrevich R, et al. Learning spam: simple techniques for freely-available software[C] // USENIX Annual Technical Conference. Berkeley: USENIX Association, 2003 : 63 - 76.
  • 2闫龙,王文杰.基于贝叶斯方法的一种垃圾邮件过滤的实现[J].微电子学与计算机,2006,23(2):86-88. 被引量:10
  • 3Karlberger C, Bayler G, Kruegel C, et al. Exploiting redundancy in natural language to penetrate bayesian spare filters[C] //Proceedings of the first USENIX workshop. Berkeley: USENIX Association, 2007: 1- 7.
  • 4殷海波,宁绍军,王东.基于内容的贝叶斯自学习邮件过滤模型[J].计算机应用与软件,2007,24(1):177-179. 被引量:7
  • 5戴劲松,白英彩.基于贝叶斯理论的垃圾邮件过滤技术[J].计算机应用与软件,2006,23(1):110-111. 被引量:16
  • 6翟凤文,赫枫龄,左万利.字典与统计相结合的中文分词方法[J].小型微型计算机系统,2006,27(9):1766-1771. 被引量:42
  • 7Ramachandran A, Feamster N, Vempala S. Filtering spam with behavioral blacklisting [ C ]//Conference on Computer and Communications Security. New York: ACM, 2007:342 - 351.
  • 8Brodsky A, Brodsky D. A distributed content independent method for spam detection[C] // Proceedings of the first conference. Berkeley: USENIX Association, 2007: 1 - 10.
  • 9Cheng D, Kannan R, Vempala S, et al. A divide-and-merge methodology for clustering [ J ]. ACM, 2006, 31 (4) :1499 - 1525.
  • 10Fumera G, Pillai I, Roli F. Spare filtering based on the analysis of text information embedded into images [ J ]. MIT Press, 2006, 7.. 2699 - 2720.

二级参考文献23

  • 1何源.警惕E—mail地址成为商品.电脑报,2003,5(2).
  • 2中国互联网络信息中心.第十三次《中国互联网络发展状况统计报告》[R].,2004,1..
  • 3上海艾瑞市场咨询公司.中国反垃圾邮件市场研究报告[R].,2003,11..
  • 4Paul Graham, A Plan For Spam,2002.08; Better Bayesian Filtering,2003,01 ;Filters That Fight Back,2003.08 ;Stopping Spare ,2003.08.
  • 5Gary Robinson, Gary Robinson's Spam Rants, 2003.06.27 - 2003.10.02.
  • 6William S. Yerazunis. The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It ,Presented at the 2004 M1T Spam Conferenct,January 18,2004.
  • 7刘群.汉语词法分析和句法分析技术综述[Z].,2002:08..
  • 8Smyth,P.:Learning with Probabilistic Supervision.In Petsche,T.,Hanson,S.,and Shavlik,J.(Eds.) Computational Learning Theory and Natural Learning Systems 3,(1995).MIT Press,pp.163~182.
  • 9陈文亮,朱靖波,姚天顺,张宇新.基于Bootstrapping的领域词汇自动获取,Proc.of JSCL2003,2003.
  • 10Blum,A.and Tom Mitchell,T.:Combining Labeled and Unlabeled Data with Co-Training.In Proceedings of the 11th Annual Conference on Computational Learning Theory,(1998).

共引文献69

同被引文献6

  • 1何田中,程从从.基于Rough集的规则抽取技术[J].南昌大学学报(工科版),2007,29(1):91-93. 被引量:2
  • 2Sahami M, Dumais S, Heckerman D, et al. A Bayesian Approach to Filtering Junk e-mail [ C ] //Learning for Text Categorization: Papers from AAAI Workshop. Madison, Wisconsin, 1998:55 - 62.
  • 3李晶皎,王爱侠,张广渊,等译.模式识别[M].3版.北京:电子工业出版社,2008:7-11.
  • 4潘文峰.[D].北京.中国科学院计算技术研究所,2004.7.
  • 5刘清.Rough集及Rough推理[M].北京:科学出版社,2005.
  • 6王国胤,于洪,杨大春.基于条件信息熵的决策表约简[J].计算机学报,2002,25(7):759-766. 被引量:594

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部