摘要
针对基于确定有限状态自动机的匹配引擎在大规模、复杂规则下会出现状态爆炸的问题,提出正则表达式子串抽取算法。通过将子串抽取算法应用于DFA状态爆炸场景,设计基于子串抽取的正则匹配引擎。实验结果表明,该算法在单个规则上运行时间可达10 ms量级,抽取率高达99%,同时匹配引擎具有较好的稳定性和可拓展性,且匹配速度优于相关开源匹配引擎。
Aiming at the problem that the Deterministic Finite Automation(DFA)-based matching engine will explode under large-scale and complex rules,a regular expression substring extraction algorithm is proposed.A substring extraction based regular matching engine is designed by applying the substring extraction algorithm to the DFA state explosion scenario.Experimental results show that the algorithm can run on a single rule up to 10 ms,and the decimation rate is as high as 99 %.At the same time,the matching engine has better stability and scalability,and the matching speed is better than the related open source matching engine.
作者
王翔
卢毓海
马伟
刘燕兵
WANG Xiang;LU Yuhai;MA Wei;LIU Yanbing(School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;National Engineering Laboratory for Information Security Technologies,Beijing 100093,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2019年第4期148-156,共9页
Computer Engineering
基金
国家重点研发计划(2016YFB0800303)
中国科学院信息工程研究所基础前沿项目(Y7Z0351101)
关键词
正则表达式
确定有限自动机
状态爆炸
子串抽取
匹配引擎
regular expression
Deterministic Finite Automaton(DFA)
state explosion
substring extraction
matching engine