期刊文献+

一种基于相似度测量的新垃圾邮件发现机制 被引量:1

A Mechanism of New Overrun Spam Detection Based on Similarity Measure
下载PDF
导出
摘要 分析新垃圾邮件发现的意义,设计用来发现新垃圾邮件的相似度测量算法——Spam-SMA,该算法使用N元字串(N-Gram)作为比较用特征,基于该算法,在规则判分的反垃圾邮件框架下,提出1种新垃圾邮件发现机制,并通过对SpamAssassin的扩展实现了该机制。在邮件服务器上进行了多次实验,结果证明,该机制可有效实现新垃圾邮件的发现。 The necessiy of a new overrun spam detection is analyzed.Then aspam detection algorithm based on similarity measure is designed, which is named Spam-SMA and it makes use of the N-gram as features for comparison.A mechanism of new overrun spam detection which uses the Spam-SMA algorithm is proposed based on the score-based rule anti-spam scheme,such as SpamAssassin,and a module of Spamassassin is implemented based on the mechanism.Experiment results show that this mechanism is effective.
出处 《中国海洋大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第S1期147-150,共4页 Periodical of Ocean University of China
基金 国家高技术研究发展计划项目(2006AA01Z214) 国家自然科学基金项目(60673159 70671020) 新世纪优秀人才支持计划项目 教育部科学技术研究重点项目(108040) 高等学校博士学科点专项科研基金课题(20060145012 20070145017) 辽宁省自然科学基金项目(20062022) 长江学者和创新团队发展计划资助
关键词 垃圾邮件 N元字串 相似度测量 基于规则判分系统 spam N-gram similarity measure score-based anti-spam system
  • 相关文献

参考文献3

二级参考文献39

  • 1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量:20
  • 2Mertz D.Spam Filtering Techniques[OL].http://www-900.ibm.com/developerWorks/cn/linux/other/l-spamf/index-eng.shtml,2002.
  • 3Sahami M,Dumais S,Hecherman D,et al.A bayesian approach to filtering junk e-mail[A].Learning for Text Categorization:Papers from the 1998 Workshop[C].Madison,Wisconsin:AAAI Technical Report WS-98-05,1998.55-62.
  • 4Druker H,Wu D,Vapnik V.Support vector machines for spam categorisation[J].IEEE Trans on Neural Networks,1999,10:1048-1054.
  • 5Goetz B.Stamp out Spam with SpamAssassin[EB/OL].http://www-106.ibm.com/developerworks/linux/library/ l-spam,2002.
  • 6Lewis D.Evaluating and optimizing autonomous text classification systems[A].Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].Seattle,Washington:ACM Press,1995.246-255.
  • 7Baluja S.Population-Based Incremental Learning:A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning[R].CMU-CS-94-163,Pittsburgh,PA:Carnegie Mellon University,1994.
  • 8M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001.
  • 9N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J].
  • 10R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002.

共引文献129

同被引文献8

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部