摘要
网络深度包检测等网络应用广泛采用正则表达式匹配技术检测网络中的传输内容,正则表达式用非确定性有限自动机(NFA)或者确定性有限自动机(DFA)实现.网络应用对匹配速度要求很高,相比NFA,DFA具有确定性的匹配速度,但所有基于DFA的方法需要预先从NFA构造一个与之等价的DFA,于是DFA的构造成为系统瓶颈之一.为此通过深入探索自动机内在运行特性——NFA状态间活跃关系和NFA中导致DFA空间膨胀的因素,设计了一种NFA状态子集的编码方法和查询方法,显著减少了DFA构造过程中状态子集的查询代价.基于入侵检测与防护系统Snort中的真实规则集的实验表明,与传统的子集构造算法相比,该方法减少了88.33%~93.57%的DFA构造时间.
Regular expression matching is the foundation of many network lunctlons sucn as deep packet inspection, which is performed using either non deterministic finite automaton (NFA) or deterministic finite automation (DFA). To meet the requirement of high-speed regular expression matching, DFA has been the prevalent choice for its deterministic nature and matching efficiency. However, all these DFA- based approaches need to construct a DFA from an NFA as an intermediate step, thus the DFA construction process can be one of bottlenecks for the system. By exploring the inherent properties of finite automaton -- whether the NFA states simultaneously active and how the self-looping NFA states lead to explosion of DFA state space -- a state subset encoding and searching scheme was designed, and a new DFA construction algorithm was proposed. Through experiments based on real life pattern sets from the Snort intrusion detection system, the new DFA construction algorithm was demonstrated to reduce the running time by 88.33~93.57% compared with that of the standard subset construction algorithm.
基金
中国科学技术大学博士研究生学术新人项目资助