期刊文献+

基于贝叶斯公式的最小损失垃圾邮件过滤算法

Minimizing Cost Filtering Algorithm for Spam E-mail Based on Bayesian
下载PDF
导出
摘要 为了减少将合法邮件误判为垃圾邮件的误报率及将垃圾邮件误判为合法邮件的漏报率的损失,首先基于现有的文本特征提取评估函数:期望交叉熵及互信息提出一种新的评估函数。利用此函数可提取到更具有代表性的邮件特征向量。在此之上提出一种基于贝叶斯公式可减少损失的垃圾邮件过滤方法。经过仿真测试后,发现基于新评估函数的新方法可有效降低误报率和漏报率。 To minimize the cost of wrong report rate that mistake the legal mails as spare and missing report rate that mistake the spam as legal mails,flrst a new evaluation function which based on existing evaluation function of text feature extraetion; expectation cross entropy and mutual information is brought forward in this paper. Using this function,we can get more representational eigenvector from email. And then this paper presents a minimizing cost anti- spare filtering algorithm based on Bayesian. After some simulation tests, it found that new algorithm based on new evaluation function can cut down wrong report rate and missing report rate efficiently.
出处 《现代电子技术》 2006年第24期55-57,共3页 Modern Electronics Technique
基金 湖北省自然科学基金(2005ABA238)资助
关键词 贝叶斯公式 评估函数 最小损失 垃圾邮件 Bayesian evaluation function cost minimizing spam
  • 相关文献

参考文献9

  • 1Younghwa Lee.The CAN-SPAM Act:A Silver Bullet Solution[J].Communications of the ACM,2005,48(6):131-132.
  • 2Ion Androutsopoulos,John Koutsias,Konstantinos V.An Experimental Comparison of Naive Bayesian and Keywordbased Anti-spam Filtering with Personal E-mail Messages.Annual ACM Conference on Research and Development in Information Retrieval,2000:160-167.
  • 3Daniel Grossman,Pedro Domingos.Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood.ACM International Conference Proceeding Series,2004,69.
  • 4Drucker H,Wu Donghui,Vapnik V N.Support Vector Machines for Spam Categorization.IEEE Transactions on Neural Neworks,1999,10(5):1 048-1 054.
  • 5Androutsopoulos I,Paliouras G.Learning to Filter Spam Email:A Comparison of a Naive Bayesian and a Memory-based Approach.In:Proc.of the Workshop Machine Learning and Textual Information Access.4th European Conf.on PKDD-2000.France,2000.
  • 6李凡,鲁明羽,陆玉昌.关于文本特征抽取新方法的研究[J].清华大学学报(自然科学版),2001,41(7):98-101. 被引量:78
  • 7詹川,卢显良,周旭,侯孟书,袁连海.基于贝叶斯公式的垃圾邮件过滤方法[J].计算机科学,2005,32(2):73-75. 被引量:11
  • 8刘震,佘堃,周明天.基于多级属性集的垃圾邮件过滤技术[J].计算机应用研究,2005,22(7):122-123. 被引量:5
  • 9丁文斌,李斌,罗浩.基于改进贝叶斯的垃圾邮件过滤系统设计与实现[J].计算机工程与应用,2005,41(18):127-130. 被引量:14

二级参考文献24

  • 1中国互联网络信息中心.第十三次《中国互联网络发展状况统计报告》[R].,2004,1..
  • 2上海艾瑞市场咨询公司.中国反垃圾邮件市场研究报告[R].,2003,11..
  • 3.[EB/OL].http://www. ai. mit. edu/~jrennie/ifile/.,.
  • 4Sahami M, Dumais S,et al. A Bayesian Approach to Filtering Junk E-Mail. Learing for Text Categorization -Papers from the AAAI Workshop,Madison Wisconsin, 1998.
  • 5Chen Duhong, Tongjie, et al. Spam Email Filter Using Naive Bayesian, Decision Tree, Neural Network and AdaBoost. http://www. cs. iastate. edu/~tongjie/spamfilter/paper. pdf.
  • 6Androutsopoulos I,Paliouras G,et al. Learning to filter spam email : a comparison of a naive Bayesian and a memory-based approach. In:Proc. of the workshop "Machine Learning and Textual Information Access", 4th European Conf. on PKDD-2000, Lyon,France, Sep. 2000.
  • 7Langley P,Wayne I,Thompson K. An Analysis of Bayesian Classifiers. In: Proc. of the 10thNational Conf. on Artificial Intelligence,San Jose,California, 1992.
  • 8Domingos P ,Pazzani M. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 1997,29:103 ~130?A?A.
  • 9Yang Y,http://citeseernjneccom/yang97comparativehtml,1997年
  • 10庞剑锋 卜东波 白硕.基于向量空间模型的文本自动分类系统的研究与实现[EB/OL].www.ict.ac.cn/xueshu/2001/115.doc,.

共引文献101

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部