期刊文献+

一种基于汉字笔顺特征的关键词变体匹配方法

A variant keyword matching method based on the stroke order features of Chinese characters
下载PDF
导出
摘要 近年来,垃圾短消息呈现出包含大量拆分字和形近字的现象,这种短消息可以绕过监控系统的关键词审查。由于拆分字和形近字数量众多,变化灵活,将其全部加入关键词库将令关键词库变得冗余。对此,本文提出了一种基于汉字笔顺特征的关键词变体匹配方法。基于汉字笔顺特征,首先合并垃圾短消息中的拆分字;然后通过建立索引表,快速查找出短消息中包含的疑似关键词;最后提出了“金字塔匹配法”匹配关键词。本文提出的方法有效降低了关键词库的冗余度,提高了关键词匹配效率。 In recent years,spam short messages appear to contain a large number of split and similar characters,this kind of short message can bypass keyword fi ltering and be sent to users.Due to the large number and fl exible changes of split words and similar words,adding them to the key database will make the database redundant.In this paper,a variant keyword matching method based on the stroke order features of Chinese characters is proposed.Firstly,the split words in spam short messages are merged based on the stroke order features of Chinese characters.Secondly,the suspected keywords contained in spam messages are indexed by an index table which is built using the characters of keywords.Finally,a pyramid matching method is proposed to match keywords.The method proposed in this paper can effectively reduce the redundancy of keywords database and improve the efficiency of keywords matching.
作者 王红雨 杜刚 朱艳云 张晨 杜雪涛 WANG Hong-yu;DU Gang;ZHU Yan-yun;ZHANG Chen;DU Xue-tao(China Mobile Group Design Institute Co.,Ltd.,Beijing 100080,China)
出处 《电信工程技术与标准化》 2020年第12期14-18,共5页 Telecom Engineering Technics and Standardization
关键词 变体匹配 合并拆分字 金字塔匹配法 matching merging split characters method of pyramid matching

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部