摘要
随着信息的迅猛增长,垃圾邮件问题日益严重。如何有效地过滤垃圾邮件成为研究的热点问题。介绍了目前比较常见的几种垃圾邮件过滤技术,分析了垃圾邮件制造者采用的各种新型手段,如简繁体混编、汉字拆分、词间加入特殊字符等,试图绕过基于内容的关键词检查。针对其中几种典型的新型垃圾邮件编写手段,提出改进的中文分词策略,结合基于内容的关键词检查,提出基于特征词扩展的内容检查过滤机制。实验验证改进后的过滤模型可在一定程度上提高对新型垃圾邮件的识别率。最后,对基于特征词扩展思想在网络内容安全和健康过滤上的应用做了展望。
With information increasing violently,spam problem becomes more and more serious.How to effectively filter spam becomes a hot issue.This paper introduces several kinds of common techniques of spam filtering.Particularly,it analyzes several kinds of new tricks used by spam makers that try to bypass the content-based inspection,such as mixing simplified and complex fonts,splitting Chinese characters,adding special characters into words.In response to these new tricks,this paper proposes an improved strategy for Chinese word segmentation.By combination of content-based keywords inspection it proposes feature expansion based mechanism of content inspection and filtering.The prospect of the idea of feature expansion's application to network content security and healthy filtering is given finally.
出处
《计算机安全》
2010年第9期22-25,共4页
Network & Computer Security
关键词
垃圾邮件
特征词扩展
汉字拆分
中文分词
关键词匹配
内容过滤
spam mail
feature expansion
Chinese characters split
Chinese word segmentation
keywords matching
content filtering