摘要
对基于流的垃圾邮件行为识别技术进行了研究。根据垃圾邮件与正常邮件通信拓扑具有较大差异的特性,引入相似度的概念,提出了一种基于拓扑相似性的垃圾邮件行为识别方法。该方法以收发件人联系表来表征收发件人,计算用户相似度以此将邮件用户划分为多个邮件用户群,通过计算邮件收发件人归属判别邮件是否为垃圾邮件。采用一个辅助分类器方便对原始邮件用户进行判别和分组,最后用真实的邮件集进行实验,结果证明基于拓扑结构相似性分类方法有较好的分类能力。
Spam behavior recognition technology,especially the behavior recognition technology based on the e-mail flow was studied.According to the communication topology of legitimate e-mail and spam with large differences,this paper introduced the concept of similarity,proposed a behavior recognition technology based on the similarity of topology.With this method,the e-mail receivers and senders were marked by the contact list.The e-mail users were divided into e-mail user clusters by calculating similarity.The senders and receivers of the coming e-mail were classified to clusters to judge whether the e-mail was spam.This paper used an auxiliary filter for classification and recognition of the original e-mail information.Simulation test with real e-mail set was conducted,which shows that the method based on the similarity of topology provides a better result for spam classification.
出处
《计算机应用研究》
CSCD
北大核心
2012年第10期3805-3808,共4页
Application Research of Computers
基金
国家自然科学基金-联合资助基金资助项目(U0970122)
中央高校基本科研业务费专项资金科技创新项目(SWJTU09CX040)
关键词
垃圾邮件
行为识别
拓扑
相似度
邮件用户群
spam
behavior recognition
topology
similarity
e-mail user cluster