期刊文献+

基于滑动窗口的优化贝叶斯邮件过滤算法 被引量:4

Improved Bayesian mail filtering algorithm based on slipping window
下载PDF
导出
摘要 贝叶斯算法在文本分类时需要进行特征提取,传统特征提取算法存在特征提取不够准确,进而导致分类效率不高。为解决此问题,提出一种基于滑动窗口的特征选取方法,该方法能扩大特征的选取范围。实验表明,改进后的方法可以有效地提高文本的分类精度。 In text classification, Bayesian algorithm needs feature selection, but the traditional feature selection algorithm is not accurate enough, which affects classification precision. To order to resolve the problem, a new improved bayes algorithm based on slipping window method is proposed. The algorithm can extend the character's number to improve the efficiency. Experimental results showed that our algorithm performed more efficienthy than the traditional methods in classification precision.
出处 《重庆邮电学院学报(自然科学版)》 2006年第4期528-531,共4页 Journal of Chongqing University of Posts and Telecommunications(Natural Sciences Edition)
基金 新世纪优秀人才支持计划(NCET) 重庆市自然科学基金项目(2005BB2063)
关键词 朴素贝叶斯算法 滑动窗口 特征选取 邮件分类 naive Bayesian algorithm slipping window feature selection mail classification
  • 相关文献

参考文献9

  • 1李志君,王国胤,吴渝.基于Rough Set的电子邮件分类系统[J].计算机科学,2004,31(3):58-60. 被引量:8
  • 2于洪,李志君,唐宏,吴中福.电子邮件过滤系统的粗糙集分析模型[J].计算机工程与应用,2003,39(15):47-48. 被引量:15
  • 3于洪,杨大春,吴中福.基于粗糙集理论的数据挖掘的应用[J].计算机与现代化,2001(4):45-48. 被引量:12
  • 4陈华辉,薛春阳.一种基于贝叶斯网的“垃圾”邮件过滤器[J].微机发展,2000,10(4):53-55. 被引量:9
  • 5潘文峰.[D].北京.中国科学院计算技术研究所,2004.7.
  • 6ANDROUTSOPOULOS I, PALIOURAS G,MICHELAKIS E. Learning to Filter Unsolicited Commercial E-mail[EB/OL]. (2004-02) [2006-01-20]. http://www.aueb.gr/users/ion/docs/TR2004_updated.pdf.
  • 7ANDROUTSOPOULOS I, KOUTSIAS J,CHANDRINOS K V, et al. An Evaluation of Na ve Bayesian Anti-Spam Filtering,[C]// 11 European Conference on Machine Learning (ECML 2000). [s.l.]:[n.s] ,2000.
  • 8HIDALGO J M G. Evaluating Cost-Sensitive Unsolicited Bulk E-mail Categorization[C]//Proceedings of ACM Symposium on Applied Computing (SAC 2002) [s. l.]. [n. s],2002.
  • 9SCHNEIDER K. A comparison of Event Models for Naive Bayes Anti-Spare E-Mail Filtering[C]// Proc. 10^th Conference of the European Chapter of the Association for Computational Linguistics ( EACL 2003 ).Budapest. [s. n]. 2003.

二级参考文献19

  • 1曾黄麟.粗集理论及其应用[M].重庆:重庆大学出版社,1998..
  • 2曾黄麟.粗集理论及其应用-关于数据推理的新方法修订版[M].重庆:重庆大学出版社,1998..
  • 3Z Pawlak.Rough Set[J].Intemational Journal of Computer and Information Sciences, 1982; 11 (5) :341-356.
  • 4Androutspoulos Iet al,An Evaluation of Naleve Bayesian AntiSpam Filtering[C].In:Proc of the Workshop on Machine Learning in the new Information Age,11th European Conf On Machine Learning(ECML 2000) ,2000:9-17.
  • 5Drucker H,Wu Donghui,Vapnlk V N.Support Vector Machines for Spam Categorization[J].IEEE Transaction on Neural Networks, 1999; 10(5).
  • 6H Yu,G Wang,D Yang.Knowledge reduction algorithms based on rough set and conditional information entropy[C].In:proceedings of spie: data mining and knowledge discovery:theory,tool,and technology IV, Orlando, USA, 2002-04; 4730: 422-431.
  • 7Lewis D D, Ringuette M.A comparison of two learning algorithms for text categorization. In:Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94) ,1994. 81-93
  • 8Androutsopoulos I,Paliouras G,Karkaletsis V, Sakkis G,Stamatopoulos P. Learning to filter spam e-mail: a comparison of a naive Bayesian and a memory-based approach. In: Proc. of the workshop '' Machine Learning and Textual Information Access'' ,4th Europe
  • 9Sahami M, Dumais S, Heckerman D, Horvitz E. A Bayesian approach to filtering junk e-mail, In Learning for Text Categorization Papers from the 1998 Workshop: [AAAI Technical Report WS-98-05]
  • 10Carreras X, Mrquez L. Boosting trees for anti-spam email filtering. In: Proc. of RANLP-01, 4th Intl. Conf. on Recent Advan-ces in Natural Language Processing, Tzigov Chark, BG,2001

共引文献61

同被引文献17

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2李洋,方滨兴,郭莉,田志宏.基于主动学习和TCM-KNN方法的有指导入侵检测技术[J].计算机学报,2007,30(8):1464-1473. 被引量:31
  • 3[2]Roberto Battiti.Using Mutual Information for Selecting Features in Supervised Neural Net Learning,Neural Networks[R].IEEE Transactions on,15(4):537 -550.
  • 4盛骤,谢式千,潘承毅.概率论与数理统计[J].北京:高等教育出版社,1988,:12-25.
  • 5中国互联网协会反垃圾邮件中心.2008年第三次中国反垃圾邮件状况调查报告[EB/OL].(2008-10-28)[2010-01-17].http://www.anti-spam.cn/pdf/2008_03_dc.pdf.
  • 6ANDROUTSOPOULOS I,PALIOURAS G,KARKALETSIS V,et a1.Learning to filter spam e-mail:a comparison of a nave Bayesian and a memory based approach[C] //ZIGHED Djamel A,KOMOROWSKI Jan,ZYTKOW Jan.Proc 4th European Conference on Principles and Practice of Knowledge Discovery in Databases(PKDD 2000).Lyon,France:Springer,2000:1-13.
  • 7CARRERAS X,MARQUEZ L.Boosting trees for anti-spam email filtering[C] //MITKOV Ruslan.Proceedings of Euro Conference Recent Advances in NLP (RANLP2001).Tzigov Chark,Bulgaria:Johu Benjamins Publishing Co,2001:58-64.
  • 8SAHAMI M,DUMAIS S,HECKERMAN D,et a1.A Bayesian approach to filtering junk e-mail[C] //MOSTOW Jack,RICH Charles.Proc of AAAI Workshop on Learning for Text Categorization.Madison,Wisconsin:Springer,1998:55-62.
  • 9YI Y,LI C,SONG W.Email classification Using Semantic FeatureSpace[C] //SUN Maosong.2008 International Conference on Advanced Language Processing and Web Information Technology.Liaoning,China:Computer Society Press,2008:32-37.
  • 10TONG B,QIN Z,MA X,et al.Som Classification Method Based On Transduction Scheme[C] //IEEE.International Conference on Apperceiving Computing and Intelligence Analysis 2008(ICACIA′08).Chengdu,China:IEEE,2008:12-15.

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部