摘要
针对高级Aho-Corasick(AC)自动机为提高串匹配速度而造成的空间浪费问题,研究发现数据流对自动机节点的访问规律,据此提出基于数据访问特征的混合自动机构建算法Hybrid FA。分别研究了基于访问频率、访问层次以及结合上述2种特征对AC自动机的部分节点实现完全化的算法。在Snort、Clam AV、URL等真实数据集上的实验结果表明,Hybrid FA算法的存储空间低于高级AC自动机的5%。此外,结合访问频率和访问层次的改进算法在保证匹配速度的同时具有更强的数据适应性。
Despite the fast speed in multiple string matching tasks, the advanced Aho-Corasick(AC) automata wastes storage memory to a great extent. Study indicated that the automata states have specific statistical access characteristics in practice. Accordingly, a series of algorithms based on statistical characteristics for building hybrid finite automata, named Hybrid FA, are proposed. This work completes partial states of the AC automata according to different features, including access frequency, state hierarchy, and combined characteristics respectively. Experimental results on the real-world datasets like Snort, Clam AV, and URL show that the storage space of Hybrid AC is reduced to less than 5% of the space cost by the advanced AC automata. Furthermore, Hybrid FA based on combined characteristics achieves the superior performance on matching speed and robustness comparing to other proposed algorithms.
出处
《通信学报》
EI
CSCD
北大核心
2015年第7期31-39,共9页
Journal on Communications
基金
中国科学院战略性科技先导专项基金资助项目(XDA06030602)
国家高技术研究发展计划("863"计划)基金资助项目(2011AA010703)
国家自然科学基金青年基金资助项目(61202477)~~
关键词
多模式串匹配
空间优化
高级AC自动机
统计策略
节点完全化
multiple string matching
memory reduction
advanced AC automata
statistical strategy
state completing