期刊文献+

一种基于最小损失的垃圾邮件屏蔽算法 被引量:2

An anti-spam filtering algorithm based on cost minimization
下载PDF
导出
摘要 考虑到反垃圾邮件本身特点,借鉴文本分类中的已有技术,将其应用到垃圾邮件的屏蔽中来.因为将合法邮件判别为垃圾邮件对于邮件用户造成的损失明显大于相反的操作,所以定义了一个损失函数,将其与朴素贝叶斯算法结合,实现了基于最小损失的垃圾邮件屏蔽算法.在一个公认的垃圾数据集上的实验结果验证了引入损失函数的有效性. Due to the characteristics of anti-spam,the technology of text categorization is introduced into anti-spam filtering.Since the cost of mistaking the legal mails as spam is obviously higher than the reverse, a cost function is defined.Compining the cost function with Nave Bayes algorithm,an anti-spam filtering algorithm based on cost minimization is presented.The results of experiments on a well-known spam collection have proved the efficiency of this method.
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2005年第z1期352-355,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
关键词 垃圾邮件屏蔽 最小损失 贝叶斯分类 文本分类 anti-spam filtering cost minimizing bayes categorization text categorization
  • 相关文献

参考文献9

  • 1[1]Fabrizio Sebastiani.Machine learning in automated text categorization [J].ACM Computing Surveys,2002,34(1):1-47
  • 2[2]Ion Androutsopoulos.Learning to filter unsolicited commercial e-mail.NCSR "Demokritos" [R].National Centre for Scientific Research "Demokritos",Ag Paraskevi,2004.No.2004/2
  • 3[4]Salton G,Buckley C.Term-weighting approaches in automatic text retrieval[R].Cornell:Cornell University,1987.TR87-881
  • 4[5]Susana Eyheramendy,David Lewis,David Madigan.On the naive bayes model for text categorization[A].In:Christopher M Bishop,Brendan J Frey,eds.The Ninth International Workshop on Arificial Intelligence and Statistics[C].Key West:Society for Artificial Intelligence and Statistics,2003.
  • 5[6]McCallum A,Nigam K.A comparision of event models for naive bayes text classification[A].Learning for Text Categorization,Learning for Text Categorization AAAIwrokshop 1998 [C].Madison Wisconsin:The AAI Press,1998.41-58
  • 6[7]Yang Y,Pedersen J P.A comparative study on feature selection in text categorization[A].In:Jr D H Fisher,ed.The Fourteenth International Conference on Machine Learning [C].Nashville:Morgan Kaufmann,1997.412-420
  • 7[8]Caropreso M F,Matwin S,Sebastiani F.A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization[A].In:Amita G C,ed.Text Databases and Document Management:Theory and Practice[C].Idea Group Publishing,2001.78-102
  • 8[10]David Lewis.Naive (bayes) at forty:the independence assumption in information retrieval [A].Proc 10th European Conference on Machine Learning[C].Chemnitz:Springer Dorint-Parkhotel,1998.
  • 9[11]张阳.复杂类型数据挖掘研究-文本挖掘[D].西安:西北工业大学计算机学院,2004.

同被引文献30

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部