期刊文献+

一种基于主动贝叶斯分类技术的垃圾邮件过滤方法 被引量:1

A spam filtering method based on active Bayesian classification technology
下载PDF
导出
摘要 目前,将机器学习、文本分类与信息过滤技术相结合的过滤方法成为研究热点。对实际邮件过滤时往往会遇到训练样本中包含大量未带类别标注的邮件,应用传统分类方法存在耗时且过滤性能差等问题,文章提出采用主动贝叶斯分类方法RANB对训练样本进行预处理,以标识其多类别;实验表明,这种方法可有效地提高训练样本质量,提高过滤器性能,在各项评价指标上具备优越性。 Current estimates indicate that nearly sixty percent of email traffic is regarded as spam and there is little reason to expect this to continue. Machine learning, text categorization and information filter can be effectively used to solve the problem. The proposed state-of the-art classification methods often label their classes firstly when there are a large number of unlabeled emails, which brings up heavy overhead of time and decreases the classification accuracy. Therefore. an active Bayesian classification technology RANB is proposed in this paper, which is used to label the classes of the unlabeled training emails as pretreatment. The experimental study shows that under the conditions of ensuring the capability of the filter in comparison with the classical methods, the method could effectively im- prove the quality of training samples and has better performance according to the appraisal standard.
出处 《合肥工业大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第9期1443-1446,共4页 Journal of Hefei University of Technology:Natural Science
基金 安徽省自然科学基金资助项目(050420207)
关键词 垃圾邮件 机器学习 文本分类 信息过滤 主动学习 贝叶斯分类 spam machine learning text categorization information filter active learning naive Bayes classification
  • 相关文献

参考文献11

  • 1中国互联网协会反垃圾邮件中心.2006年第四次中国反垃圾邮件状况调查报告[EB/OL].http://www.anti-spam.eft2007-09-21.
  • 2Androutsopoulos I, Paliouras G, Karkaletsis V, et al. Learning to filter spam e-mail: a comparison of a naive Bayesian and a memory based approach[C]//Proc 4th Euro2pean Conference on Principles and Practice of Knowl- edge Discovery in Databases (PKDD 2000), 2000:1-13.
  • 3Carreras X, Marquez L. Boosting trees for anti-spare email filtering[C]//Proceedings of Euro Conference Recent Ad vances in NLP (RANLP22001), 2001 : 58-64.
  • 4Drucker H, Wu D, Vapnik V N. Support vector machines for spare categorization [J].IEEE Transactionson Neural Networks, 1999,20(5) : 1048-1054.
  • 5Ji Shihao, Krishnapuram B, Carin L. Variational Bayes for continuous hidden Markov models and its application to active learning [J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2006,28(4) : 522-532.
  • 6赵悦,穆志纯.基于委员会投票选择方法的主动学习的研究[J].太原理工大学学报,2006,37(4):469-472. 被引量:7
  • 7宫秀军,孙建平,史忠植.主动贝叶斯网络分类器[J].计算机研究与发展,2002,39(5):574-579. 被引量:37
  • 8LiuTao, Liu Shengping,Chen Zheng, et al. An evaluation on feature seleetion for text clustering[C]//Proceedings of the 20 th International Conference on Machine Learning (ICML-03),2003:488-495.
  • 9Yu Lei, Liu Huan. Feature selection for high dimensional data: a fast correlation based filter solution[C]//Proceedings of the 20 th International Conference on Machine Learning (ICML-03), 2003:856-863.
  • 10王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129

二级参考文献43

  • 1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量:20
  • 2史忠植.知识发现[M].北京:清华大学出版社,2000..
  • 3M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001.
  • 4N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J].
  • 5R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002.
  • 6M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian approach to filtering junk e-mail[A]. In:Proc. of AAAI Workshop on Learning for Text Categorization[C]. pp. 55-62, 1998.
  • 7W. Cohen, Fast effective rule induction[A]. In: Machine Learning Proceedings of the Twelfth International Conference[C]. Lake Taho, California, Mongan Kanfmann, pp. 115-123, 1995.
  • 8W. Cohen, Learning rules that classify email[A]. In: Proceedings of the AAAI spring symposium of Machine Learning in Information Access, Palo Alto[C]. California, pp. 18 - 25. 1996.
  • 9X. Carreras and L. Marquez, Boosting Trees for Anti-Spam Email Filtering[A]. In: Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001)[C]. pp. 58-64, Sep. 2001.
  • 10T. Nicholas, Using AdaBoost and Decision Stumps to Identify Spam E-mail[ EB]. Stanford University Course Project (Spring 2002/2003) Report, from http: ∥nlp. stanford. edu/courses/cs224n/2003/fp/.

共引文献169

同被引文献14

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2兰亚,吴渝,王国胤,董蓓.基于滑动窗口的优化贝叶斯邮件过滤算法[J].重庆邮电学院学报(自然科学版),2006,18(4):528-531. 被引量:4
  • 3李洋,方滨兴,郭莉,田志宏.基于主动学习和TCM-KNN方法的有指导入侵检测技术[J].计算机学报,2007,30(8):1464-1473. 被引量:31
  • 4中国互联网协会反垃圾邮件中心.2008年第三次中国反垃圾邮件状况调查报告[EB/OL].(2008-10-28)[2010-01-17].http://www.anti-spam.cn/pdf/2008_03_dc.pdf.
  • 5ANDROUTSOPOULOS I,PALIOURAS G,KARKALETSIS V,et a1.Learning to filter spam e-mail:a comparison of a nave Bayesian and a memory based approach[C] //ZIGHED Djamel A,KOMOROWSKI Jan,ZYTKOW Jan.Proc 4th European Conference on Principles and Practice of Knowledge Discovery in Databases(PKDD 2000).Lyon,France:Springer,2000:1-13.
  • 6CARRERAS X,MARQUEZ L.Boosting trees for anti-spam email filtering[C] //MITKOV Ruslan.Proceedings of Euro Conference Recent Advances in NLP (RANLP2001).Tzigov Chark,Bulgaria:Johu Benjamins Publishing Co,2001:58-64.
  • 7SAHAMI M,DUMAIS S,HECKERMAN D,et a1.A Bayesian approach to filtering junk e-mail[C] //MOSTOW Jack,RICH Charles.Proc of AAAI Workshop on Learning for Text Categorization.Madison,Wisconsin:Springer,1998:55-62.
  • 8YI Y,LI C,SONG W.Email classification Using Semantic FeatureSpace[C] //SUN Maosong.2008 International Conference on Advanced Language Processing and Web Information Technology.Liaoning,China:Computer Society Press,2008:32-37.
  • 9TONG B,QIN Z,MA X,et al.Som Classification Method Based On Transduction Scheme[C] //IEEE.International Conference on Apperceiving Computing and Intelligence Analysis 2008(ICACIA′08).Chengdu,China:IEEE,2008:12-15.
  • 10WANG Lei,KHAN Latifur,THURAISINGHAM Bhavani.An Effective Evidence Theory based K-nearest Neighbor (KNN)classification[C] //IEEE.International Conference on Web Intelligence and Intelligent Agent Technology.Sydney,Australia:IEEE,2008:797-801.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部