摘要
基于多关键字匹配的Sun Wu算法进行的分析,结合QS算法的思想,设计了一种改进的多关键字匹配算法:QMS(quick multi-pattern searching)。算法使用散列技术和前缀表减少发生部分匹配时实际进行的关键字比较次数。在计算跳跃距离时,充分考虑当前窗口的紧邻下一个字符带来的信息,进而使用更加精确的跳跃距离计算方法以获得更大的平均跳跃距离,从而获得更高的扫描效率和空间利用率。在真实文本上的对比实验表明,在通常应用环境中,该算法显著的缩短了扫描时间,取得了很好的效果。
An improved algorithm for matching multiple strings is suggested. The new algorithm, named QMS (Quick Multi-pattern Searching), is based on the analysis of Sun Wu algorithm and the idea of QS algorithm. QMS uses hashing and PREFIX table to decrease the number of comparisons. During the computation of the shift distance, the character closely after the current window is considered. Thus shift distances are computed with more accurate technique, and larger average shift distance is acquired, The scanning efficiency and the space utility are improved in result. Series of tests on an actual corpus show that QMS algorithm is much more efficient than Sun Wu algorithm under usual circumstances.
出处
《南京理工大学学报》
EI
CAS
CSCD
北大核心
2005年第6期735-739,共5页
Journal of Nanjing University of Science and Technology
基金
国家自然科学基金(60272088)