基于DoLFA的高效正则表达式匹配算法

Efficient Regular Expression Matching Algorithm Based on DoLFA

下载PDF

导出

摘要随着规则数量的急剧增长,表示正则表达式的DFA(Deterministic Finite Automata,确定型有限自动机)容易引起状态空间爆炸,难以满足高速网络的实时处理需求。提出一种高效的正则表达式匹配算法,该算法通过将正则表达式分割为精确串、字符集合以及重复字符3个子集,分别对其进行分区优化及检测,然后再利用结点信息对匹配信号进行连接,即构建一种特殊的状态机DoLFA(Divide-optimize-Link Finite Automata)。理论分析和仿真结果表明,该算法可以大大节省存储空间,并获得较高的吞吐量,且具有较强的扩展性。 With the rapid increase of the number of rules,the DFA used to present regular expression often results in states explosion,so it is very hard to satisfy the requirement of high speed network online processing.This paper proposed an efficient regular expression matching algorithm,which first divides an expression into three subsets：exact string,character class and character repetition,and then optimizes and detects the corresponding blocks,at last links them together with auxiliary node data structure,namely constructing a special state machine DoLFA.Theoretical analysis and simulation shows that this algorithm not only can save more memory space,but also provide high throughput performance and scalability.

作者杜文超陈庶樵胡宇翔

机构地区国家数字交换系统工程技术研究中心

出处《计算机科学》 CSCD 北大核心 2012年第9期14-19,共6页 Computer Science

基金国家重点基础研究发展计划(2012CB315901) 国家科技支撑计划(2011BAH19B01)资助

关键词深度包检测正则表达式有限自动机编码计数器 Deep packet inspection Regular expression Finite automata Coding Counter

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Aho A V, Corasick M J. Efficient String Matching: An Aid to Bibliographic Search[J]. Communications of the ACM, 1975,18, (6) : 333-340.
2Kumar S,Dharmapurikar S, Yu Fang, et al. Algorithms to acce: lerate multiple regular expressions matching for deep packet in- spection[A]//Proc, of SIGCOMM[C]. Pisa, IT ACM Press, 2006:339-350.
3Becchi M, Cadambi S. Memory-effident regular expression search using state merging[A]//Proc, of INFOCOMEC2. Anchorage, USA: IEEE Press, 2007 : 1064 1072.
4Becchi M, Crowley P. A hybrid finite automaton for practical deep packet inspection[A]//Proc, of the 2007 ACM CoNEXT Conference[C]. New York, USA~ ACM Press, 2007 : 1 12.
5Yu F,Chen Z, Diao Y, et al. Fast and memory-efficient regular expression matching for deep packet inspection [M]. ANCS, 2006 : 93-1 02.
6Smith R,Estan C,Jha S, et al. Deflating the big bang: fast andscalable deep packet inspection with extended finite automata [A]//Proc. of SICA;OMM[C]. Seattle, USA: ACM Press, 2008: 207-218.
7Kumar S, Chandrasekaran B,Turner J, et al. Curing regular ex- pressions matching algorithms from insomnia, amnesia and aeal- culia[A]//Proe, of ANCS[C]. Princeton, USA: ACM Press, 2007:155 164.
8Bando M, Arran N S, Chao H J, et al. LaFA: Lookahead Finite Automata for Scalable Regular Expression Detection [A] /// Proc. of ANCS[C]. Princeton, USA: ACM Press, 2009 : 40-49.
9Bando M, Artan N S, Mehta N, et al. Hardware implementation for scalable Iookahead regular expression deteetion[M]. RAW, 2010.
10Cormen T H, Leiserson C E, Rivest R L, et al. Stein, Introduc- tion to Algorithms (Second Edition)[M]. The MIT Press, 2002.

二级参考文献57

1李伟男,鄂跃鹏,葛敬国,钱华林.多模式匹配算法及硬件实现[J].软件学报,2006,17(12):2403-2415. 被引量：42
2Aho AV, Corasick MJ. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975,18(6): 333-340. [doi: 10.1145/360825.360855].
3Li WN, E YP, Ge JG, Qian HL. Multi-Pattern matching algorithms and hardware based implementation. Journal of Software, 2006, 17(12):2403-2415 (in Chinese with English abstract), http://www.j os.org.cn/1000-9825/17/2403.htm [doi: 10.1360/j os 172403 ].
4Hopcroft JE, Motwani R, Ullman JD. Introduction to Automata Theory, Languages, and Computation. 3rd ed., Reading: Addison Wesley, 2006.
5Hopcroft J. An O(n log n) algorithm for minimizing states in a finite automaton. Technical Report, STAN-CS-TR-71-190, Stanford: Stanford University, 1971.
6Yu F, Chen ZF, Diao YL, Lakshman TV, Katz RH. Fast and memory-efficient regular expression matching for deep packet inspection. In: Bhuyan LN, Dubois M, Eatherton W, eds. Proe. of the 2006 ACM/IEEE Symp. on Architecture for Networking and Communications Systems. New York: ACM, 2006.93-102. [doi: 10.1145/1185347.1185360].
7AbuHmed T, Mohaisen A, Nyang D. A survey on deep packet inspection for intrusion detection systems. Magazine of Korea Telecommunication Society, 2007,24(11):25-36.
8BrodieBC, Cytron RK, Taylor DE. A scalable architecture for high-throughput regular-expression pattern matching. In: Kaeli D, ed. Proc. of the 33rd Int'l Symp. on Computer Architecture. New York: ACM, 2006. 191-202. [doi: 10.1109/ISCA.2006.7].
9Becchi M, Crowley P. An improved algorithm to accelerate regular expression evaluation. In: Yavatkar R, Grunwald D, Ramakrishnan KK, eds. Proc. of the 2007 ACM/IEEE Symp. on Architecture for Networking and Communications Systems. New York: Association for Computing Machinery, 2007. 145-154. [doi: 10.1145/1323548.1323573].
10Kumar S, Dharmapurikar S, Yu F, Crowley P, Turner J. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: Rizzo L, Anderson T, McKeown N, eds. Proc. of the 2006 Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communications. New York: Association for Computing Machinery, 2006. 339-350. [doi: 10.1145/1159913.1159952].

共引文献24

1孟洛明.IP网的可测可控可管：问题、现状和若干重要研究方向[J].中兴通讯技术,2010,16(B08):30-35.
2李鲲鹏,兰巨龙,李印海.基于Bloom filter的高效正则表达式匹配算法[J].计算机应用研究,2012,29(3):950-954. 被引量：4
3张大方,张洁坤,黄昆.一种基于智能有限自动机的正则表达式匹配算法[J].电子学报,2012,40(8):1617-1623. 被引量：14
4魏德志,洪联系,林丽娜,吴旭.一种改进的XFA在深度包检测中的应用[J].计算机工程与应用,2012,48(34):245-248. 被引量：1
5王燕凤,马君,马宁.基于正则表达式的数字图书馆检索模型研究[J].西北民族大学学报（自然科学版）,2012,33(4):43-46. 被引量：4
6李鲲鹏,兰巨龙,李玉峰.基于DFA结构的高速并行正则表达式匹配算法[J].小型微型计算机系统,2013,34(5):1050-1053. 被引量：2
7赵博,郭虹,刘勤让,邬江兴.基于加权累积和检验的加密流量盲识别算法[J].软件学报,2013,24(6):1334-1345. 被引量：41
8高翔,武斌,俞学浩,吴赞红.一种基于ICAP的实时数据防泄漏方案[J].信息网络安全,2013(11):49-53. 被引量：2
9武光达,蒋朝惠.基于DPI的流量识别系统的研究[J].信息网络安全,2014(10):44-48. 被引量：12
10林丽娜,洪联系.一种基于多线程的混合深度包检测方法[J].集美大学学报（自然科学版）,2014,19(6):472-476.

1雷丽晖,段振华.一种基于扩展有限自动机验证组合Web服务的方法[J].软件学报,2007,18(12):2980-2990. 被引量：37
2魏强,李云照,褚衍杰.基于图划分的正则表达式分组算法[J].计算机工程,2012,38(18):137-139. 被引量：3
3ARM推出全新IP工具套件[J].单片机与嵌入式系统应用,2015,15(8):86-86.
4陈君,葛莉.基于项头表节点的Fp-growth改进算法[J].信息技术,2012,36(12):34-35.
5吉顺如,万锋,杨泽平.PE-Link与TCP/IP协议转换网关的研究[J].电气自动化,2007,29(2):29-31.
6张绍军,刘辉,孙君强,郑自发.码垛机器人控制系统应用与改造[J].化工管理,2014(17):167-167.
7邵翔宇,刘勤让,谭力波.基于规则模板的正则表达式分组算法[J].电子学报,2016,44(1):236-240. 被引量：8
8王志鹏,孙萌.数据仓库的查询优化研究[J].无线互联科技,2014,11(3):154-154.
9丁亦喆,魏刃佳,刘博,吴振强.移动互联网中一种移动学习方案的设计与原型实现[J].计算机工程与科学,2015,37(2):288-293. 被引量：6
10谭作亘,李光辉.E-link在智能小区建设中的应用[J].低压电器,2004(1):21-23.

计算机科学

2012年第9期

浏览历史

内容加载中请稍等...

基于DoLFA的高效正则表达式匹配算法

参考文献16

二级参考文献57

共引文献24

相关作者

相关机构

相关主题

浏览历史