摘要
目前各种基于规则的分类方法在电子邮件过滤中起到了良好的效果,在邮件过滤器的训练中,训练集中会存在部分邮件具有邮件类别模糊的现象,如何将训练集中的此类类别界限模糊的邮件提取出来将会对邮件的分类效果有明显提高的作用。提出一种基于聚类的过滤方法,根据界限模糊邮件数据之间的共性特征,对邮件训练集进行聚类。实验表明,与单纯的进行基于规则的分类算法相比,这种方法在各项评价指标上具有优越性。
Presently, a variety of rule-based classification methods in e-mail filtering obtain good results.In the training of e-mail filtering, the training set has the notion that some e-mail messages will be sent to the hazy category.Extracting these e-mails from training set will have a noticeable increase in the results of classification.Therefore, a clustering-based filtering method is proposed in this paper.The common features of the hazy-category email include cluster the training set.Experiments demonstrate that the method has better performance on the appraisal standard than that of a simple rule-based classification algorithm.
出处
《计算机系统应用》
2010年第9期147-150,共4页
Computer Systems & Applications
基金
安徽省基金课题(090412044)
关键词
聚类
文本分类
垃圾邮件
clustering
text categorization
spam