期刊文献+

简体中文垃圾邮件分类的实验设计及对比研究 被引量:3

Simplified Chinese spam mail filter:design and performance evaluation
下载PDF
导出
摘要 综合分析了垃圾邮件过滤的技术路线与方法,并在分析基于关键字的方法和统计学的方法的基础上,提出了将两者相结合,运用模式识别中的贝叶斯、最近邻和感知机等分类方法,实现对垃圾邮件的过滤的技术路线。以互信息最大化准则筛选出的特征集为基础,对不同分类技术的对比分析揭示了贝叶斯、最近邻和感知机在垃圾邮件过滤应用上的优劣。同时,文中对基于互信息最大化准则的垃圾邮件过滤应用提出了有益的思路。 Paths to solving and methods of filtering unsolicited bulk e-mails,also known as spam,have been analyzed.And the method based on keyword and the statistical learning have been analyzed.Then a new method which is a combination of the two methods have been proposed.The method to filter spam using the naive Bayesian decision theory,the nearest-neighbor classification,and the linear classification based the perceptron criterion function which is used in pattern classification has been introduced.The feature set used in the three theories have been gotten by mutual information.By comparied the three decision theories,the advantages and disadvantages of them has been presented.At same time,a good idea to filtering spam using mutual information has been pointed out in the paper.
作者 李维杰 徐勇
出处 《计算机工程与应用》 CSCD 北大核心 2007年第25期128-132,共5页 Computer Engineering and Applications
基金 国家自然科学基金(the National Natural Science Foundation of China under Grant No.60602038) 广东省自然科学基金( the NaturalScience Foundation of Guangdong Province of China under Grant No.06300862) 。
关键词 垃圾邮件 分类器 贝叶斯 最近邻 感知机 spam mail classification Bayesian decision nearest-neighbor decision perceptron criterion function
  • 相关文献

参考文献18

  • 1Androutsopoulos I,Paliouras G,Karkaletsis V,et al.Learning to filter spam e-mail:a comparison of a Naive Bayesian and a memory-based approach[C]//Proceedings of the Workshop:Machine Learning and Textual Information Access,2002:1-13.
  • 2Carreras X,Márquez L.Boosting trees for antispam email filtering[C]//Proceedings of 4th Int'l Conf on Recent Advances in Natural Language Processing,2001:58-64.
  • 3Cohen W W.Learning rules that classify e-mail[C]//Proceedings of AAAI Spring Symposium on Machine Learning in Information Access,1996:18-25.
  • 4Drucker H,Wu D,Vapnik V N.Support vector machines for spam categorization[J].IEEE Trans Neural Networks,1999,10 (5):1048 -1054.
  • 5Sahami M,Dumais S,Heckerman D,et al.A Bayesian approach to filtering junk e-mail[C]//Learning for Text Categorization Papers from the AAAI Workshop,1998:55-62.
  • 6Sakkis G,Androutsopoulos I,Paliouras G,et al.Stacking classifiers for anti-spam filtering of e-mail[C]//Proceedings of the 6th Conf on Empirical Methods in Natural Language Processing,2001:44-50.
  • 7Kolcz A,Alspector J.SVM-based filtering of email spam with content-specific misclassification costs[C]//Proceedings of TextDM'01 Workshop on Text Mining,2001.
  • 8Klensin J.RFC2821:simple mail transfer protocol[S],2001-04.
  • 9Resnick P.RFC2822:internet message format[S],2001-04.
  • 10Freed N,Borenstein N.RFC 2045:multipurpose internet mail extensions(MIME) part one:format of internet message bodies[S],1996.

共引文献11

同被引文献63

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 3邵必林,马维平,边根庆.基于贝叶斯理论的中文垃圾邮件过滤技术理论探讨[J].网络安全技术与应用,2007(4):89-91. 被引量:1
  • 4云计算[EB/OL].http://en.wikipedia.org/wiki/Cloud_computing.
  • 5Wu Jiansheng,Deng Tao.Research in anti-spam method based on bayesian fihering[J].2008 IEEE PacificAsia Workshop on Computational Intelligence and Industrial Application,2008:887-891.
  • 6Yeh Chunchao,Chiang Sounjan.Revisit bayesian approaches for spam detection[J].The 9th International Conference for Young Computer Scientists,2008:659-664.
  • 7Sahami M,Dumais S,Heckerman D,et al.A bayesian approach to filtering junk e-mail[J].Learning for Text Categorization:Papers from the 1998 Workshop,Madison,Wisconsin,1998:55-62.
  • 8CCERT Data Sets of Chinese Emails[EB/OL].(2005-06)[2008-12-20].http://www.ccert.edu.cn/spam/sa/datasets.htm.
  • 9Yang Yiming.An evaluation of statistical approaches to text categorization[J].Information Retrieval,1999,1(1/2):69-90.
  • 10中国互联网协会反垃圾邮件中心[EB/OL].[2011-05-18].http://www.anti-spam.cn/.

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部