期刊文献+

基于聚类的类别模糊邮件过滤方法

Clustering-Based Email Filtering Method with Hazy Category
下载PDF
导出
摘要 目前各种基于规则的分类方法在电子邮件过滤中起到了良好的效果,在邮件过滤器的训练中,训练集中会存在部分邮件具有邮件类别模糊的现象,如何将训练集中的此类类别界限模糊的邮件提取出来将会对邮件的分类效果有明显提高的作用。提出一种基于聚类的过滤方法,根据界限模糊邮件数据之间的共性特征,对邮件训练集进行聚类。实验表明,与单纯的进行基于规则的分类算法相比,这种方法在各项评价指标上具有优越性。 Presently, a variety of rule-based classification methods in e-mail filtering obtain good results.In the training of e-mail filtering, the training set has the notion that some e-mail messages will be sent to the hazy category.Extracting these e-mails from training set will have a noticeable increase in the results of classification.Therefore, a clustering-based filtering method is proposed in this paper.The common features of the hazy-category email include cluster the training set.Experiments demonstrate that the method has better performance on the appraisal standard than that of a simple rule-based classification algorithm.
出处 《计算机系统应用》 2010年第9期147-150,共4页 Computer Systems & Applications
基金 安徽省基金课题(090412044)
关键词 聚类 文本分类 垃圾邮件 clustering text categorization spam
  • 相关文献

参考文献4

  • 1黄萱菁,吴立德.基于向量空间模型的文档分类系统[J].模式识别与人工智能,1998,11(2):147-153. 被引量:24
  • 2Yang YM, Pedersen JO. A comparative study on feature selection in text categorization. Proc. of ICML-97,14th International Conference on Machine Learning, San Francisco: Morgan Kaufmann, 1997:412 - 420.
  • 3张铭锋,李云春,李巍.垃圾邮件过滤的贝叶斯方法综述[J].计算机应用研究,2005,22(8):14-19. 被引量:23
  • 4Yang XH, Yu K, Deng W. A k-means clustering algorithm based on self-adoptively selecting density radius. International Journal of Computer Science and Network Security, 2006,6(8A):43 - 47.

二级参考文献28

  • 1吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 2黄萱菁,吴立德,王文欣,叶丹瑾.基于机器学习的无需人工编制词典的切词系统[J].模式识别与人工智能,1996,9(4):297-303. 被引量:24
  • 3Chen H,IEEE Trans PAMI,1996年,18卷,8期,771页
  • 4王开铸,计算语言进展与应用,1995年,359页
  • 5团体著者,中国图书馆图书分类法,1990年
  • 6G Hulten, J Goodman.Tutorial on Junk Mail Filtering[R].
  • 7W Cohen. Fast Effective Rule Induction,in Machine Learning[C].Proceedings of the 12th International Conference, Lake Taho, California, Mongan Kanfmann,1995.115-123.
  • 8X Carreras, L Marquez. Boosting Trees for Anti-Spam E-mail Filtering[C]. Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001), 2001. 58-64.
  • 9I Androutsopoulos, G Paliouras, E Michelakis. Learning to Filter Unsolicited Commercial E-mail[R]. Technical Report 2004/2, NCSR Demokritos, 2004.
  • 10刘洋,杜孝平,等.垃圾邮件的智能分析、过滤及Rough集讨论[C].武汉:第十二届中国计算机学会网络与数据通信学术会议,2002.

共引文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部