期刊文献+

基于TF*IDF的垃圾邮件过滤特征选择改进算法 被引量:6

Improved feature selection algorithm in spam filtering based on TF*IDF
下载PDF
导出
摘要 随着电子邮件的普及与应用,垃圾邮件的泛滥也越来越受到人们的关注。而如何进行邮件特征选择,是邮件分类中的重要问题。在介绍词频和倒文档频度的基础上,对几种常用的特征选择算法进行了分析和比较,针对现有特征选择算法过于机械的缺点,将关键字权重引入到邮件分类中,提出了一种基于关键词权重的TF*IDF特征选择改进算法,并进行了实验验证。实验结果表明,采用该算法改进后的贝叶斯过滤器具有更好的过滤效果。 With the development of network and computer, more and more spam e-mails affect our lives. This paper firstly introduced the current popular feature selection methods based on term frequency and inversed document frequency. Then it compared and analyzed the various feature extraction algorithms, and introduced a new extracted feature algorithm by using the advanced TF * IDF. Finally it completed the experimental verification with the PU1 corpus. The experiment results demonstrate that the advanced naive Bayes filter has better performance.
出处 《计算机应用研究》 CSCD 北大核心 2009年第6期2165-2167,共3页 Application Research of Computers
基金 河北省自然科学基金资助项目(F2008000877)
关键词 垃圾邮件 过滤器 贝叶斯 特征选择 TF*IDF spam filtering Bayes feature selection TF * IDF
  • 相关文献

参考文献5

二级参考文献17

  • 1欧阳,韩逢庆.基于多Bayes网的垃圾邮件智能过滤研究[J].计算机科学,2004,31(8):61-63. 被引量:2
  • 2SAHAMI M,DUMAIS S,HECKERMAN D,et al.A Bayesian approach to filtering Junk e-mail[C]//AAAI Workshop on Learning for Text Categorization.Madison,Wisconsin:[s.n.],1998:55-62.
  • 3GRAHAM P.A plan for spam[EB/OL].URL:http://paulgraham.com/spam.html.
  • 4GRAHAM P.Better bayesian filtering[EB/OL].URL:http://paulgraham.com/better.html.
  • 5ROBINSON G.A statistical approach to the spam problem[J/OL].Linux Journal,2003 (107).URL:http:/ /www.linuxjournal.com/ article.php?sid=6467.
  • 6SEGAL R,CRAWFORD J,KEPHART J,et al.SpamGuru:an enterprise anti-spam filtering system[C/OL]//Proceedings of First Conference on Email and Anti-Span (CEAS).Mountain View,CA:[s.n.],2004.URL:http://www.ceas.cc/papers-2004/126.pdf.
  • 7ANDROUTSOPOULOS I,KOUTSIAS J,CBANDRINOS K V,et al.An experimental comparison of naive bayesian and keyword-based anti-spam filtering with encrypted personal e-mail messages[C]//Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR 2000).Athens,Greece:[s.n.],2000:160-167.
  • 8MCCALLUM A,NIGAM K.A comparison of event models for naive bayes text classification[C]//Proceedings of AAAI-98 Workshop on Learning for Text Categorization.Menlo Park,CA:AAAI Press,1998:41-48.
  • 9Zhang Le, Zhu Jingbo, Yao Tianshun. An Evaluation of Statistical Spam Filtering Techniques[J]. ACM Transactions on Asian Language Information Processing, 2004, 3(4): 243-269.
  • 10Yang Aiming, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization[C]//Proceedings of the 14^th International Conference on Machine Learning. 1997.

共引文献139

同被引文献40

引证文献6

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部