摘要
互联网中的不法分子为了逃避安全过滤,将不良信息中的文本进行变形,并在在网络中散布。为了识别和过滤这些不良文本,该文分析了其变形的特征,根据词同现和字符编码规则的不同对文本进行预处理,从文本中抽出包含有变形特征的有害词串。针对这些有害词串中各字符相邻、有序频繁出现的特点,提出采用基于关联规则自学习算法提取具有安全特色的关键词。实验表明,该方法可以改善传统方法在安全过滤过程中无法识别变形关键词的现状,对主题过滤提供补充,提高基于内容的安全过滤的效率。
In order to prevent the spread of the ill metamorphosed texts in Internet which escapes from the traditional security filtering, a security identification method is presented. The features of metamorphosed characters in the ill texts are analyzed, they are recognized according to the character co-occurrence and the different codes of the characters and symbols, then the extraction algorithm based on association rules is proposed to update the ill feature dictionary. The experiments show that it can improve the current situation that the metamorphosed terms could not be identified using the traditional methods and improve the efficiency and the capability of feature identification as the complement of the topic filtering.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第21期155-156,159,共3页
Computer Engineering
关键词
关联规则
安全过滤
关键词识别
变形文本
association rules
security filtering
keywords identification
transformed text