摘要
采用规则分组的方法解决确定型有限自动机(Deterministic Finite Automata,DFA)状态爆炸问题,随着分组数目的增加,匹配效率大大降低.本文提出正则表达式的输入驱动特性理论,并基于此提出了基于规则模板的分组算法——模板有限自动机.模板有限自动机算法基于规则模板对规则集进行分组,各分组分别构建匹配引擎.理论分析和实验表明,与典型的DFA改进算法相比,预处理时间和存储空间有2~3个数量级别的缩减,且匹配效率没有明显降低.
As the group number grows,the classical signature grouping algorithm solves the DFA state explosion problem with a big decrease on matching efficiency. This paper presents a regular expression( Regex) input drive theory. According to such theory,a grouping algorithm based on signature templates,templates based finite automata( TFA),is proposed. TFA divides Regex set based on signature templates and constructs matching engines in each set. Experiment results showthat the preprocessing time and storage are reduced by 2 ~ 3 orders of magnitude compared with classical DFA improved algorithms,and TFA brings no obvious decrease on matching efficiency.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2016年第1期236-240,共5页
Acta Electronica Sinica
基金
国家973重点基础研究发展计划(No.2013CB329104)
关键词
正则表达式
确定型有限自动机
分组自动机
扩展有限自动机
多维有限自动机
规则模板
regular expression
deterministic finite automata
multiple DFAs
extended finite automata
multi-dimensional finite automata
signature templates