摘要
本文针对Wu-Manber多模式匹配算法在处理后缀模式情况下的不足,给出了一种改进的后缀模式处理算法,减少了匹配过程中字符比较的次数,提高了算法的运行效率。本文在随机选择的TREC2000的52,067篇文档上进行了全文检索实验,对比了Wu-Manber算法、使用后缀模式的改进算法、不使用后缀模式的简单改进等三种算法的匹配过程中字符比较的次数。实验结果说明,本文的改进能够比较稳定的减少匹配过程中字符比较的次数,提高匹配的速度和效率。
The Wu-Manber multiple-pattern matching algorithm does not work well when some patterns are suffix of other patterns. To solve the problem, an improved algorithm is introduced which reduces the number of comparisons during pattern matching and leads to a faster matching algorithm. The text retrieval experiments use 52,067 passages which are randomly selected from TREC2000. Three algorithms including the Wu-Manber algorithm, the improved algorithm and the algorithm simply breaks halfway, are compared and the results show that the improved algorithm can steadily reduce the number of character comparisons and thus work more efficiently.
出处
《中文信息学报》
CSCD
北大核心
2006年第2期47-52,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金重点基金资助(60435020)
哈尔滨工业大学校基金资助项目(HIT2002.71)
关键词
计算机应用
中文信息处理
多模式匹配
后缀模式
字符串匹配
全文检索
信息检索
computer application
Chinese information processing
multiple-pattern matching
sutffix pattern
string matching
full text retrieval
information retrieval