摘要
针对WM算法的查找效率随着模式集规模的增大而降低的问题,提出一种改进算法。在预处理阶段,改变原有Hash表中的链表结构,采用双哈希法将模式串存放在Hash1表中指定的区间,Hash表中存放该存储区间的起始位置与区间长度;Prefix表用于判断模式集中是否存在与当前匹配窗口中文本前缀相同的模式;当Shift表中出现移动值为0时,根据后缀出现在模式串其他位置的信息计算匹配窗口可滑动的最大距离并存于Shift1表中。在查找阶段,采用双哈希法在Hash1表的某一区间中查找模式串,避免在大规模模式集情况下查找过长的模式链表,扩大匹配操作后匹配窗口滑动的距离,减少冗余的匹配操作,缩短查找时间。研究结果表明:在模式集规模较大时,改进后的算法显著地提高了匹配速度;当模式串数目超过5 000条时,改进算法的查找时间要比WM算法缩短40%~47%。
To resolve the problem that with the constant increase of the number of rules,the performance of Wu-Manber algorithm will become less efficient,an improved Wu-Manber algorithm named double Hash searching Wu-Manber algorithm(DHSWM) was proposed.In the pre-processing stage,the patterns were stored in specified intervals in Hash1 table by double Hash method while Hash table was used to store the parameters which indicate the start address of the interval and its length.Prefix table was used to determine whether the patterns in set and the text of current matching window had the same prefix.When the shifting distance was 0 in Shift table,Shift1 table was used to store the maximum sliding distance of matching window according to the suffixes appearing in other locations of pattern string.In the searching stage,double Hash method was used to look up patterns in the interval of Hash1 table to avoid searching for overlong linked list in the case of large scale pattern set.The sliding distance of matching window was enlarged after the matching procedure,so redundant matching operations was reduced and the search time was shortened.The results indicate that the algorithm can improve the speed of pattern matching when the scale of the pattern set is large.Compared with the WM algorithm,the DHSWM algorithm can reduce the search time by 40%?47% when the number of patterns is more than 5 000.
出处
《中南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2011年第12期3765-3771,共7页
Journal of Central South University:Science and Technology
基金
国家自然科学基金资助项目(61073187)