期刊文献+

结合特征和非特征信息改进Nave Bayes及其应用 被引量:2

Improved Nave Bayes combining feature with noncharacteristic information and its application
下载PDF
导出
摘要 朴素贝叶斯算法是一种常见的基于内容的垃圾邮件过滤算法,但是,传统朴素贝叶斯过滤存在判断内容的不确定性和邮件表示不完整性等问题。分析邮件信头各域在正常邮件和垃圾邮件中表现出的不同属性,提取非特征信息,结合特征信息和非特征信息改进朴素贝叶斯算法。实验结果表明,改进的朴素贝叶斯分类方法与单纯使用特征信息的方法相比,垃圾邮件的召回率和准确率更高,凸显了该方法涵盖邮件信息、克服内容判断缺陷的优势。 Nave Bayes algorithm was widely used in the content-based filtering,but traditional Nave Bayes faced many problems,such as the uncertainty of classifying e-mails by analyzing e-mail content,the incompleteness of e-mail representation.In order to overcome these shortcomings,this paper analyzed different attributes between ham e-mail header and spam e-mail header,extracted noncharacteristic information,and improved Nave Bayes algorithm which combined feature information with noncharacteristic information.Experimental results show that the improved Nave Bayes classification approach increases the recall and the precision of spam,covers e-mail information,and makes up for the shortage of content-based filtering,compared with that of only using feature information.
出处 《计算机应用研究》 CSCD 北大核心 2011年第2期514-516,共3页 Application Research of Computers
基金 国家自然科学基金资助项目(60873247) 山东省高新自主创新专项工程(2008ZZ28) 山东省自然科学基金重点资助项目(ZR2009GZ007)
关键词 邮件过滤 非特征信息 特征信息 朴素贝叶斯算法 e-mail filtering noncharacteristic information feature information Nave Bayes algorithm
  • 相关文献

参考文献10

  • 1中国互联网协会反垃圾邮件中心.2009年第四季度中国反垃圾邮件状况调查报告[EB/OL].(2010-04-23)[2010-05-27].http://www.12321.cn/viewnews.php?id=12679.
  • 2SALTON G, LESK M E. Computer evaluation of indexing and text processing[J].Journal of the ACM,1968,15(1):8-36.
  • 3LANGLEY P, LBA W, THOMPSON K. An analysis of Bayesian classifiers[C]//Proc of the 10th National Conference on Artificial Intelligence. Menlo Park: AAAI Press and MIT Press, 1992: 223-228.
  • 4齐浩亮,程晓龙,杨沐昀,何晓宁,李生,雷国华.高性能中文垃圾邮件过滤器[J].中文信息学报,2010,24(2):76-83. 被引量:7
  • 5王军,史科,王辉.垃圾邮件过滤中特征选择方法研究[J].合肥工业大学学报(自然科学版),2009,32(12):1863-1866. 被引量:2
  • 6梁刚,刘晓洁,李涛,蒋亚平,杨进,龚勋.NSC:一种新型的垃圾邮件过滤器[J].小型微型计算机系统,2008,29(1):158-161. 被引量:5
  • 7李洋,赵骅.基于信息熵和决策分类技术的邮件识别研究[J].计算机科学,2008,35(2):87-89. 被引量:1
  • 8刘洋,杜孝平,周二胜,等. 垃圾邮件的智能分析、过滤及Rough 集讨论[C]//第十二届中国计算机学会网络与数据通信学术会议论文集.2002: 515-521.
  • 9赵利,廖闻剑,彭艳兵. 基于中文主题的垃圾邮件过滤方法研究[C]//中国通信学会第六届学术年会论文集(上).2009:16-19.
  • 10KOSMOPOULOS A, PALIOURAS G, ANDROUTSOPOULOS I. Adaptive spam filtering using only Nave Bayes text classifiers[C]//Proc of the 5th Conference on E-mail and Anti-Spam.2008.

二级参考文献43

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2LI Tao.An immune based dynamic intrusion detection model[J].Chinese Science Bulletin,2005,50(22):2650-2657. 被引量:17
  • 3Nicholas T. Using AdaBoost and decision stumps to identify spam e-mailER/OLd. Stanford University Course Project (Spring 2002/2003) Report, 2003. http://nlp, stanford. edu/courses/es224n/2003/fp/. 2003-06-07.
  • 4Drucker H,Vapnik V N. Support vector machines for spam categorization [ J ]. IEEE Transactions on Neural Networks, 1999, 20(5) :1048--1054.
  • 5Carreras X,Marquez L. Boosting trees for anti spam email filtering[C]//Proeeedings of Euro Conference Recent Advances in NLP (RANLP-2001), 2001: 58-64.
  • 6Sahami M, Dumais S, Heckerman D, et al. A Bayesian approach to filtering junk e-mail[C]//Proc of AAAI Workshop on Learning for Text Categorization, 1998:55--62.
  • 7Androutsopoulos I, Paliouras G,Karkaletsis V, et al. Learning to filter spare e-mail: a comparison of a naive Bayesian and a memory-based approach [C]//Proc 4th European Conference on Principles and Practice of Knowledge Diseov cry in Databases (PKDD 2000),2000:1-13.
  • 8Salton G. On the specification of term values in automatic indexing [ J ]. Journal of Documentation, 1973, 29 ( 4 ) : 351--372.
  • 9Yang Y. A comparative study on feature selection in text categofization[C]//Proceeding of the Fourteenth International Conference on Machine Learning ( ICMI. ' 97 ), 1997 : 412--420.
  • 10Peat H J, Willet P. The limitations of term data for query expansion in document retrieval systems[J]. Journal of the American Society for Information Science, 1991,42(5) :378--383.

共引文献11

同被引文献14

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部