期刊文献+

基于决策树的敏感词变形体识别算法研究及应用 被引量:19

Research and application of change form of sensitive words recognition algorithm based on decision tree
下载PDF
导出
摘要 针对网络中敏感词变形体识别效率不高的问题,提出了基于决策树的敏感词变形体识别算法。首先,通过分析汉字的结构和读音等特征,研究敏感词及变形体;其次,基于敏感词库构建敏感词决策树;最后,通过多因子改进模型,对微博等新媒体的文本敏感程度进行计算。实验结果表明,该算法在识别中文敏感词及变形体时,查全率和查准率最高分别可达95%和94%,与基于确定有穷自动机的改进算法相比,查全率和查准率分别提高了19.8%和21.1%;与敏感信息决策树信息过滤算法相比,查全率和查准率分别提高17.9%和18.1%。通过分析,该算法对敏感词变形体的识别和自动过滤是有效的。 In order to solve the problem that the recognition efficiency of sensitive word deformed bodies of the network text is not high,this paper proposed a sensitive word deformed bodies recognition algorithm based on decision tree.Firstly,it studied sensitive words and deformed bodies by analyzing the characteristics of Chinese characters and pronunciation and so on.Secondly,it constructed a sensitive word decision tree based on sensitive word library.Finally,it calculated the text sensitivity of new media such as Weibo by multi-factor improved model.The experimental results show that the proposed algorithm can achieve the highest recall rate and precision rate of 95%and 94%respectively when identifying Chinese sensitive words and deformed bodies.Compared with the improved algorithm based on the finite automaton,the recall rate and the precision rate are increased by 19.8%and 21.1%respectively.Compared with the sensitive information decision tree information filtering algorithm,the recall rate and the precision rate are increased by 17.9%and 18.1%respectively.The analysis show that the algorithm is effective in the recognition and automatic filtering of sensitive word deformed bodies.
作者 余敦辉 张笑笑 付聪 张万山 Yu Dunhui;Zhang Xiaoxiao;Fu Cong;Zhang Wanshan(College of Computer&Information Engineering,Hubei University,Wuhan 430062,China;Education Informationization Engineering&Technology Center of Hubei Province,Wuhan 430062,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第5期1395-1399,1405,共6页 Application Research of Computers
基金 国家重点研发计划资助项目(2016YFB0800401) 国家自然科学基金资助项目(61572371,61832014) 湖北省技术创新专项(重大项目)(2018ACA13)。
关键词 敏感词识别 敏感词变形体 决策树 敏感程度计算 多因子模型 sensitive word recognition sensitive word deformable body decision tree sensitivity computation multi factor model
  • 相关文献

参考文献6

二级参考文献44

共引文献124

同被引文献182

引证文献19

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部