期刊文献+

一种基于概率推理的邮件过滤系统的研究与设计 被引量:1

Research and Design of a Spam Filtering System Based on Probability Inference
下载PDF
导出
摘要 分类问题是机器学习与数据挖掘研究中最重要的问题之一,其中文本自动分类是信息检索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速的发展。设计了一种基于贝叶斯概率推理方法的垃圾邮件过滤系统。它用概率测试的权重来描述数据间的相关性,从而解决了数据间的不一致性,甚至是相互独立的问题。作为互联网的第一大应用,电子邮件一直受到广大网民的青睐,但近些年来,垃圾邮件问题日益严重。将上述研究的结果应用到目前互联网上垃圾邮件的过滤工作中,实验证明了方法的有效性。 Classification is one of the most important research fields in data mining and machine learning. In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in information retrieval and data mining field. Designs a spam email filtering system hased on improved Bayesian probability inference. It uses weight from probability test to describe the correlativity of data;consequently it solves the inconsistent and mutual independent problems. Applies it to the spare filtering work on Internet, the validity of this system is proved. At last, some future directions of the research are given.
出处 《计算机技术与发展》 2008年第8期76-79,共4页 Computer Technology and Development
基金 国家自然科学基金资助项目(60273043) 安徽大学研究生创新基金资助项目(20073053)
关键词 机器学习 文本分类 垃圾邮件 贝叶斯方法 machine learning text classification sparn Bayesian method
  • 相关文献

参考文献5

  • 1Mitchell T M. Machine Learning[ M]. [ s. l. ] : McGraw2Hill, 1997.
  • 2Meretakis D, Dimitris F, Lu Hongjun, et al. Scalable Association - Based Text Classification[ C]//Proceedings of the 9th ACM Int Conf Information and Knowledge Management (CIKM ' 00). Washington, US: [ s. n. ],2000.
  • 3Heckerman D. Bayesian networks for data rnining[J ]. Machine Learning, 1995,20 : 196 - 243.
  • 4张铭锋,李云春,李巍.垃圾邮件过滤的贝叶斯方法综述[J].计算机应用研究,2005,22(8):14-19. 被引量:23
  • 5王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129

二级参考文献55

  • 1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量:20
  • 2M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001.
  • 3N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J].
  • 4R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002.
  • 5M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian approach to filtering junk e-mail[A]. In:Proc. of AAAI Workshop on Learning for Text Categorization[C]. pp. 55-62, 1998.
  • 6W. Cohen, Fast effective rule induction[A]. In: Machine Learning Proceedings of the Twelfth International Conference[C]. Lake Taho, California, Mongan Kanfmann, pp. 115-123, 1995.
  • 7W. Cohen, Learning rules that classify email[A]. In: Proceedings of the AAAI spring symposium of Machine Learning in Information Access, Palo Alto[C]. California, pp. 18 - 25. 1996.
  • 8X. Carreras and L. Marquez, Boosting Trees for Anti-Spam Email Filtering[A]. In: Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001)[C]. pp. 58-64, Sep. 2001.
  • 9T. Nicholas, Using AdaBoost and Decision Stumps to Identify Spam E-mail[ EB]. Stanford University Course Project (Spring 2002/2003) Report, from http: ∥nlp. stanford. edu/courses/cs224n/2003/fp/.
  • 10Y. Diao, H. LuandD. Wu, A Comparative Study of Classification Based PersonalE-mail Filtering[A]. In: Proceedings of PAKDD-2000[C], pp.408-419, Apr. 2000.

共引文献143

同被引文献5

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部