期刊文献+

基于朴素贝叶斯模型的邮件过滤技术 被引量:6

SPAM Filtering with Naive Bayes
下载PDF
导出
摘要 针对朴素贝叶斯算法应用于反垃圾邮件过滤时,其有效性十分依赖于对邮件内容的有效建模,而邮件内容建模方面研究尚不成熟限制了贝叶斯方法在垃圾邮件过滤中的性能.采用了三种概率分布对邮件内容进行建模,据此提出了3种概率分布下的朴素贝叶斯算法.为了提高训练效率,算法采用了一种增量式的垃圾邮件过滤方法.在trec05p-1、trec06p两个公开数据集上对这3种贝叶斯算法进行了实验对比,分析出三种贝叶斯分布的适用范围.从不同分布的邮件内容建模角度出发,为过滤垃圾邮件的方法选择提供了有效依据. Abstract:The effectiveness of Naive Bayes in spare filtering depends on the modelling of the mail contents. However, mail content modelling is not mature, which limits the performance of Bayesian method in spam filtering. This paper presents three kinds of probability distribution to model email content, and proposes three Na'gve Bayes algorithms based on different probability distributions. To improve training efficiency, the incremental training algo- rithm is utilized in the experimental procedure. Experiments on trec06p and trec05p - 1 show that the three pro- posed algorithms can achieve good performance in different sceneries. Such a finding also provides effective basis for the selection of the filtering methods.
出处 《哈尔滨理工大学学报》 CAS 2014年第1期49-53,共5页 Journal of Harbin University of Science and Technology
基金 黑龙江省普通高等学校新世纪优秀人才培养计划(1155-ncet-008) 教育部人文社科项目(11YJC740048) 黑龙江省教育科学规划课题(GBC1211062) 黑龙江省高等教育教学改革项目(2011-NP33)
关键词 邮件过滤 朴素贝叶斯 机器学习 e-mail fiherring naive bayes machine learning
  • 相关文献

参考文献20

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2刘海韬,阳洁.云计算平台下一种新型反垃圾邮件系统的研究[J].中南大学学报(自然科学版),2013,44(5):1869-1874. 被引量:4
  • 3SUN Guanglu, SUN Hongyue, MA Yingcai, et al. Spam Filte- ring: Online na'l've Bayes Based on TONE[ C ]//ZTE Communica- tions, 2013:51 -54.
  • 4CORMACK G, LYNAM T. TREC 2005 Spam Track Overview [ C ]//Proceedings of the Fourteenth text Retrieval Conference Proceedings. US : Text REtrieval Conference, 2005 : 123 - 130.
  • 5CORMACK G. TREC 2006 Spam Track Overview [ C ]//Proceed- ings of the Fifteenth Text Retrieval Conference Proceedings, US: Text REtrieval Conference, 2006 : 117 - 128.
  • 6SCULLEY D. Online Active Learning Methods for Fast Label-Effi- cient Spam Filtering [ C ]//CEAS, 2007 : 1 - 4.
  • 7孙广路,齐浩亮.基于在线排序逻辑回归的垃圾邮件过滤[J].清华大学学报(自然科学版),2013,53(5):734-740. 被引量:11
  • 8SCULLEY D. Practical Learning From One-sided Feedback [ C ]// Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007:609 -618.
  • 9CHEN C, TIAN Y, ZHANG C. Spam Filtering with Several Novel Bayesian Classifiers [ C ]//Pattern Recognition, 2008, ICPR, 19th Intemational Conference on IEEE, 2008:1 -4.
  • 10全亮亮,吴卫东.基于支持向量机和贝叶斯分类的异常检测模型[J].计算机应用,2012,32(6):1632-1635. 被引量:7

二级参考文献87

共引文献161

同被引文献20

引证文献6

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部