摘要
有关信息过滤的算法应用广泛,随着微信、微博等社交平台的迅速发展,短文本信息在网络通信中占据了主流,针对短文本信息的过滤也越来越重要。通过比较BF算法、KMP算法、AC算法等经典模式匹配算法的优缺点,选择更适合短文本过滤的DFA算法。介绍DFA算法基本原理,提出一种基于DFA的改进算法,改进后的算法通过敏感词预处理和过滤过程优化来提高检测率。实验结果表明,相比于SWDT-IFA算法,改进后的算法对中文对话数据集检测的查准率提高了3%,误报率降低了0.87%,具有较高的应用价值。
The filtering algorithm of text is widely used,with the rapid development of social platforms such as WeChat and Weibo,short text messages occupy the mainstream in network communication,and the filtering of short text messages is becoming more and more important.By comparing the advantages and disadvantages of classic pattern matching algorithms such as BF algorithm,KMP algorithm,AC algorithm,etc.,this paper selects the DFA algorithm that is more suitable for short text filtering,introduces the basic principles of DFA algorithm,and proposes an improved algorithm based on DFA,which improves the detection rate through sensitive word preprocessing and filtering process optimization.Experimental results show that the improved algorithm improves the accuracy of Chinese dialogue dataset detection by 3%and re⁃duces the false alarm rate by 0.87%,which has high application value.
作者
关兴义
赵敏
伍文昌
GUAN Xing-yi;ZHAO Min;WU Wen-chang(Command and Control Engineering College,Army Engineering University of PLA,Nanjing 210007,China)
出处
《软件导刊》
2023年第4期103-108,共6页
Software Guide