摘要
本文描述了一种对网络流式数据实时监控的搜索算法,应用有限自动机的原理,实现对任意长度数据流进行多关键字无回溯单遍匹配扫描,且加入概率计算,在一定程度上实现文本的简单模糊语义分析。该算法已被网络过滤软件使用,并有良好表现。
This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text, especially for data stream from Internet. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. To some extension, the algorithm implements simple ambiguous syntactic parser in text. The algorithm has been used to improve the speed of some filter software for internet.
出处
《齐齐哈尔大学学报(自然科学版)》
2005年第2期37-41,共5页
Journal of Qiqihar University(Natural Science Edition)