摘要
通过加入噪声、替代文字等手段隐藏信息内容的真实符号分布,是目前垃圾制造者干扰或躲避过滤器检测的主要手段。介绍一种使用基于条件随机域的序列方法实现垃圾信息特征复原的技术。该方法比现有的基于模式匹配、基于序列比对和基于隐马尔科夫模型等方法在模型建立上有着更大的灵活性和鲁棒性。实验表明,使用条件随机域的特征复原方法可以明显提高基于压缩和内容过滤器的过滤性能。
It is the main trick for spammers to bypass the detection of spam filters by obfuscating real symbols distribution of the content of a message in terms of noisy characters and/or text replacement.This paper presents a sequence technology based on Conditional Random Fields(CRF) to achieve spam information feature recovery.Compared with existing methods based on pattern matching,sequences alignment and various Hidden Markov Models,the proposed method is much robust and flexible in modelling.Experimental results demonstrated the manifest enhancement of filtering performance based on compression and content filters by the feature recovery technology using CRF.
出处
《计算机应用与软件》
CSCD
2010年第7期67-70,106,共5页
Computer Applications and Software
基金
国家下一代互联网示范工程基金资助项目(CNGI-04-12-2A)
关键词
条件随机域
垃圾信息
特征复原
Conditional random field Spam Feature recovery