摘要
通过对老挝文语言特点的分析,提出一种基于确定有穷自动机和决策树的老挝文敏感信息过滤算法。将老挝文进行词汇划分和编码化处理,合理地解决老挝文与汉语书写上的差异性以及计算机读取存储出现乱码的问题;结合决策树的特点,构建老挝文敏感信息决策树,该树不依赖于词典,且可以实现实时更新;基于确定有穷自动机模型实现了老挝文敏感信息的检测和过滤,同时也实现了实时报警。实验表明,该过滤算法针对老挝文有较高的工作效率,同时也取得了较好的查全率和查准率。
This paper analyzes the characteristics of Lao language and proposes a Lao sensitive information filtering algorithm based on the deterministic finite automaton and decision tree. The vocabulary division and coding processing of Lao language reasonably solved the differences between Lao and Chinese writing and the problems of garbled codes in computer reading and storage. Combining the characteristics of the decision tree, we constructed a decision tree for Lao language sensitive information, which did not depend on the dictionaries and could be updated in real time. On the basis of the deterministic finite automaton model, the Lao language sensitive information was detected and filtered, and real-time alarm was also realized. Experiments show that the filtering algorithm has a higher working efficiency for Lao language, and has also achieved better recall and precision.
作者
王艺皓
丁洪伟
王丽清
李波
李浩
Wang Yihao;Ding Hongwei;Wang Liqing;Li Bo;Li Hao(School of Information Science and Engineering,Yunnan University,Kunming 650500,Yunnan,China;Office of Science and Technology,Yunnan University,Kunming 650500,Yunnan,China)
出处
《计算机应用与软件》
北大核心
2022年第7期241-246,274,共7页
Computer Applications and Software
基金
国家自然科学基金项目(61862064,61461053,61461054)
云南大学服务云南行动计划(C176240501007)
省教育厅产业化扶持项目(2016CYH03)。
关键词
确定有穷自动机
决策树
敏感信息过滤
老挝文过滤
网络舆情
Deterministic finite automaton
Decision tree
Filtering sensitive information
Lao filtering
Internet public opinion