摘要
虽然目前垃圾邮件过滤或检测的研究比较多,但是它们大多数是基于邮件客户端。文章提出了一种基于后缀树的骨干网络垃圾邮件检测方法,它采用后缀树文本表示方法,通过不定长统计方法判定邮件是否相似,然后利用邮件重复出现的次数判定是否为垃圾邮件。该方法不需要任何训练,直接对接收的邮件进行分类统计;对于长度为的邮件,算法的时间复杂度和空间复杂度均为;另外,该方法独立于任何语种。
Although there are a lot of studies for filtering and detecting spam,most of them are for E-mail clients.In this paper,we propose a new spam detection method on backbone network based on suffix tree.It represents the text of E-mail with suffix tree structure,and uses the variable length string matching method to decide the similarity between mails,then uses the count of similar mails to determine whether mails are junk mails or not.This method didn't require any training corpus,and directly classifying and counting accepted mails.It took time and space to detect the mail of length.Besides,this method is independent of language.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第28期132-135,217,共5页
Computer Engineering and Applications