期刊文献+

基于朴素贝叶斯和层次聚类的两阶段垃圾邮件过滤方法 被引量:5

A Two-Stage Spam Email Filtering Method Based on Naive Bayes and Hierarchical Clustering
下载PDF
导出
摘要 为降低对合法邮件的误判,提出一种基于朴素贝叶斯和层次聚类的两阶段垃圾邮件过滤方法。该方法将邮件划分为"合法邮件"、"可疑邮件"和"垃圾邮件"3类,在第一阶段,利用朴素贝叶斯算法速度快、分类性能好的优点,对邮件进行初步分类;在第二阶段,基于垃圾邮件的发送特征,利用层次聚类算法进行相似性比较。实验表明,该方法能够显著提高垃圾邮件的查准率,降低对合法邮件的误判,更加符合实际应用需求。 To reduce misclassification rate of legitimate emails, proposed a two-stage spare email filtering method based on naive Bayes and hierarchical clustering. This method classifies emails as Legitimate, Unsure and Spare. At first stage, it classifies email as Legitimate and Unsure by using naive Bayesian classifier. At second stage, a hierarchical clustering method is used to find similar email in the pre-collected spam emails set. The experiment showed that, this method can increase the precision of spam detection, lower the misclassification of legitimate emails, which is more viable in practice.
出处 《微电子学与计算机》 CSCD 北大核心 2007年第8期1-3,7,共4页 Microelectronics & Computer
基金 国家"863"计划项目(2003AA148010) 国家火炬计划项目(2005EB011484)
关键词 朴素贝叶斯 层次聚类 垃圾邮件过滤 naive bayes hierarchical clustering spam email filtering
  • 相关文献

参考文献5

  • 1李惠娟,高峰,管晓宏,黄亮.基于贝叶斯神经网络的垃圾邮件过滤方法[J].微电子学与计算机,2005,22(4):107-111. 被引量:21
  • 2Sahami M,Dumais S,Heckerman D,et al.A Bayesian approach to filtering junk email[A].Proc.of AAAI Workshop on Learning for Text Categorization[C].1998
  • 3Androutsopoulos I,Koutsias J,Chandrinos KV,et al.An evaluation of naive Bayesian anti-spam filtering[A].Proc.of the workshop on Machine Learning in the New Information Age[C].2000
  • 4Manber U.Finding similar files in a large file system[A].Proceedings of Winter USENIX Conference[C].San Francisco,1994:17-21
  • 5Broder A Z,Glassman S C,Manasse M S,et al.Syntactic clustering of the web[A].Proceedings of the sixth International World Wide Web Conference[C].Santa Clara,USA:Elsevier Science,1997:391-404

二级参考文献9

  • 1Androutsopoulos, I., Koutsias, J., etc. An Evaluation of Naive Bayesian Anti-Spam Filtering,Proceedings of the workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning,Barcelona, Spain, 2000, 9~17.
  • 2MacKay, D. J.CProbable networks and plausible predictionsa review of practical Bayesian methods for supervised neural networks Network: Computation. In Neural Systems.6 (August 1995) 469~505.
  • 3Androutsopoulos, I., Koutsias, I., etc. An Evaluation of Naive Bayesian Anti-Spam Filtering, Proceedings of the workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning,Barcelona, Spain, 2000,9~17.
  • 4Ma, Q. C., Wu, C.H., etc. Application of Bayesian Neural Networks to Biological Data Mining: A Case Study in DNA Sequence Classification, 4~6.
  • 5MacKay, D. J. C. ,Bayesian Methods for Neural Networks:Theory and Applications. Neural Computation, 4, 448-472.
  • 6Guyon, I., An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 2003,3:1157~1182.
  • 7Yang Y M., Pedersen J O. A Comparative Study on Feature Selection in Text Categorization. Proceedings of theFourteenth International Conference on Machine Learning Pages: 412~420.
  • 8Eyheramendy S, Lewis D., etc. On the Naive Bayes Model for Text Categorization. In Proceedings of Artificial Intelligence & Statistics 2003. Key West, FL.
  • 9范明 孟小峰等译.数据挖掘-概念与技术[M].机械工业出版社,2001..

共引文献20

同被引文献34

  • 1李惠娟,高峰,管晓宏,黄亮.基于贝叶斯神经网络的垃圾邮件过滤方法[J].微电子学与计算机,2005,22(4):107-111. 被引量:21
  • 2张红梅,张慧档,田耕.面包烘焙品质检验中纹理特征的提取[J].计算机工程与设计,2005,26(9):2451-2452. 被引量:10
  • 3李翔鹰,叶枫.一种基于多贝叶斯算法的垃圾邮件过滤方法[J].计算机工程与应用,2006,42(31):114-116. 被引量:7
  • 4张文修 吴伟业 梁吉业 等.粗糙集理论与方法[M].北京:科学出版社,2002..
  • 5Saharni M,Dumais S,Heckerman D,et al.A Bayesian approach to filtering junk e-mail[C]//Proceeding of AAAI Workshop on Ixaming for Text Categorization, 1998 : 55-62.
  • 6Pawlak Z.Rough set theory and its applications to data analysis[J].Cybernetics and. Systems, 1998,29. 661-688.
  • 7Fayyad U M, Irani K B.Muhi-interval discrerization of contirmousvalued attributes for classification learning[C]//Proceedings of the 13th International Joint Conference on Artificial Intelligence' Morgan Kanfmann, 1994:1022-1027.
  • 8Rosetta.http://www.idi.ntnu.no/-aleks/thesis/.
  • 9Spam E-mail database.http://www.ics.uci.edu/-mlearn/MLRepository. html.
  • 10Vapnik V N. The nature of statistical leaning theory[M]. New York: Springer- Verlag, 1995.

引证文献5

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部