期刊文献+

基于多项式模型和低风险的贝叶斯垃圾邮件过滤算法 被引量:10

A Bayesian spam filtering algorithm based on polynomial model and low risk
下载PDF
导出
摘要 针对现有贝叶斯算法应用于垃圾邮件过滤时,贝叶斯贝努利模型对邮件文本特征向量进行处理不能区分特征向量的重要性,导致邮件分类召回率低,同时还存在合法邮件被误判的风险的问题,采用贝叶斯多项式模型对特征向量进行加权处理来区分特征向量的重要性;然后,采用低风险策略来降低合法邮件被误判的风险,提出基于多项式模型和低风险的贝叶斯垃圾邮件过滤算法。实验结果表明:对于不同数量的特征项,该算法能够有效提高邮件分类的正确率与召回率,降低合法邮件被误判的风险,并在过滤文本字符数量较大的邮件时,具有性能平稳、波动小的特点。 Existing Bayesian algorithms use Bernoulli model to process text features in the application to spam filtering,which does not distinguish the varying importance of various features,leading to a low recall rate in mail classification.In addition,existing Bayesian algorithms also have the risk of mis-judging legitimate mail.A Bayesian spam filtering algorithm was proposed based on the polynomial model and the low risk.The algorithm measures the weight of text features to distinguish their importance in mail classification,and then compares the probabilities that a mail respectively fall into the spam class or the normal mail class.The results show that this algorithm effectively improves the recall and precision rate of mail classification,and reduces the risk of mis-judging legitimate mail.Additionally,the algorithm is of smooth and little fluctuation when filtering mails with a large number of text characters.
出处 《中南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2013年第7期2787-2792,共6页 Journal of Central South University:Science and Technology
基金 国家自然科学基金资助项目(61272401 61133005 61173167 61070194) 国家高技术研究发展计划("973"计划)子项目(2012CB315801)
关键词 邮件过滤 特征提取 概率度量 多项式模型 风险评估 mail filtering feature extraction probability measurement polynomial model risk assessment
  • 相关文献

参考文献15

  • 1Meehan S, Susan D, David H et al. A Bayesian approach to filtering Junk e-mail[J]. AAAI Workshop, 1998, 4(13): 55-62.
  • 2Kma C, H1 C, Ht N. Bayesian online classifiers for text classification and filtering[C]// Proceedings of 25th ACM International Conference on Research and Development in Information Retrieval. New York: ACM, 2002: 97-104.
  • 3Sudhakar V, Rao C M, Somayajula S P K. Bayesian spam filtering using statistical data compression[J]. International Journal of Computer Science and Information Security, 2011, 9(10): 157-159.
  • 4LUO Qin, LIU Bing, YAN Junhua et al. Research of a spam filtering algorithm based on naive Bayes and AIS[C]// 2010 International Conference on Computational and Information Sciences. Washington: IEEE, 2010: 152-155.
  • 5苏贵洋,马颖华,李建华.一种基于内容的信息过滤改进模型[J].上海交通大学学报,2004,38(12):2030-2034. 被引量:22
  • 6Thiago S S, Walmir M C. A review of machine learning approaches to Sparn filtering[Y]. Expert Systems with Applications, 2009, 36(7): 10206-10222.
  • 7Lin Y P, Chen Z P, Yang X L, et al. Mail filtering based on the risk minimization Bayesian algorithm[C]//The 6th World Multi conference on Science Citation Index (SCI 2002). Proceedings-Industrial System and Engineering E, 2002, 17(2): 282-285.
  • 8LIN Shah, NING Guoning, ZHAO Zhiling. Application of Chinese word segmentation to anti-spam systems[J]. Journal of South China University of Technology: Natural Science Edition, 2004, 32(6): 113-116.
  • 9Provost J. Naive Bayes rule-learning in classification of e-mail[R]. Texas: The University of Texas at Austin Artificial Intelligence Lab Technical Report, 1999: 5-10.
  • 10张文良,黄亚楼,倪维健.基于差分贡献的垃圾邮件过滤特征选择方法[J].计算机工程,2007,33(8):80-82. 被引量:10

二级参考文献20

  • 1毕建东,学位论文,1996年
  • 2方世昌,离散数学,1985年
  • 3Belkin N J, Croft W B. Information filtering and information retrieval: two sides of the same coin? [J]. Communications of the ACM, 1992, 35(12):29-37.
  • 4Waldman M, Rubin A, Cranor L. Publius: a robust, tamper-evident, censorship-resistant web publishing system[A]. Proc of the 9th USENIX Security Symposium[C]. Denver, USA: [s.n.], 2000. 59-72.
  • 5Mladenic D. Text-learning and related intelligent agents: a survey[J]. IEEE Intelligent Systems, 1999, 14(4) 44-54.
  • 6Yang Y. Expert network: effective and efficient learning from human decisions in text categorization and retrieval[A]. In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94)[C]. CA USA: [s.n.], 1994. 13-22.
  • 7Cheeseman P, Kelly J, Self M, et al. Autoclass: a bayesian classification system[A]. Proc Fifth Int Conf on Machine Learning[C]. San Mateo, CaJifornia: Morgan Kaufmann, 1988. 54-64.
  • 8Apte C, Damerau F, Weiss S. Text mining with decision rules and decision trees[A]. Proceedings of the Conference on Automated Learning and Discovery[C]. CMU, USA: [s.n.], 1998. 62-68.
  • 9Wiener E, Pedersen J O, Weigend A S. A neural network approach to topic spotting[A]. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95)[C]. Las Vegas, USA: ISRI, Univ of Nevada, 1995. 58-62.
  • 10Thorsten J. Text categorization with support vector machines: learning with many relevant features[A]. European Conference on Machine Learning (ECML)[C]. Dortmund, German: Springer, 1998. 137-142.

共引文献45

同被引文献83

引证文献10

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部