摘要
提出了紧缩存储型Aho-Corasick算法变体,以异构的按需隐式存储取代同构的例行显式存储,从横向扇出压缩与纵向路径压缩2个方向入手,围绕着压缩稀疏事件表展开,当字符集大小σ=256时可将存储量缩减为原来的0.69%左右,而σ=64K时则达0.004%,即空间复杂度降为原来的(lbσ)/σ左右。依据扇出疏密程度的不同,分类采用了4种有针对性的快速事件定位方法,加之优化的失败迁移,使得存储量的大幅缩减不以速度的明显损失为代价,实验也证实了这一点。适用于需承载大型模式集和较长模式串而对时延和抖动都比较敏感的场合(如在线数据流过滤),在宽字符(如UNICODE型亚洲字符)匹配方面拥有显著优势。
A variation of Aho-Corasick algorithm via compact storage was presented, which replaced homogeneous routine explicit storage with heterogeneous requirement oriented implicit one, first started from two aspects of widthwise fan-out compression and lengthwise path compression, then expanded around the compression of sparse event table, thus reduced memory usage to about 0.69% of original one when alphabet size a = 256, 0.004% when σ = 64K, and space complexity approximate (1bσ)/σ of origin. According to different degrees of fan-out, four kinds of specific fast event location methods were adopted, plus optimized failure transitions, led to the fact that the dramatic reduction of memory usage isn't at the cost of obvious loss of speed, which was also proved by experiments. It's suitable to be applied in the cases that need holding mass set of longer patterns and are sensitive to delay and jitter(for example, online data stream filtering) and presents distinguished superiority in wide-character(such as Asian character of UNICODE type) matching.
出处
《通信学报》
EI
CSCD
北大核心
2009年第3期119-124,共6页
Journal on Communications
基金
国家重点基础研究发展计划("973"计划)基金资助项目(2007CB311100)
国家高技术研究发展计划("863"计划)基金资助项目(2007AA01Z473)~~
关键词
多模式匹配
紧缩存储
扇出压缩
路径压缩
事件定位
multiple patterns matching
compact storage
fan-out compression
path compression
event location