一种基于Bloom Filter的正则表达式集合快速搜索算法被引量：4

A Fast Regular Expression Set Matching Algorithm Based on Bloom Filter

下载PDF

导出

摘要正则表达式搜索算法的性能与从非确定性有限状态自动机(NFA)的初始状态到终止状态的最短路径Lmin成正比,与正则表达式所表达的语言的前缀集合Pref(RE)成反比,而一般情况下Pref(RE)较大,确定Pref(RE)中的元素在目标文本中的出现位置比较困难.文中提出了一种基于Bloom Filter的正则表达式集合搜索算法,此算法利用BloomFilter集合查询时间与集合大小无关的特点,可以快速准备定位Pref(RE)的出现位置,使得搜索速度不受Pref(RE)的影响,如果采用多个Bloom Filter并行,还可以间接增大Lmin.分析与测试结果表明,该算法较大地加快了正则表达式的搜索速度,对于正则表达式集合,算法性能改善尤其明显,在Lmin较长、Pref(RE)较大时,搜索速度可以提高数倍至数十倍,适合大规模的多正则表达式的快速搜索. The effectiveness of the regular expression searching algorithms are proportional to the shortest path Lmin from the initial state to the final state of NFA and is inversely proportional to the prefix set Pref（RE） of the language that denotes the regular expression. In general, the elements in Pref（RE） are difficult to locate in the target text because the set of Pref（RE） is large. Proposed in this paper is a regular expression searching algorithm based on the Bloom Filter of which computation time to perform the query is independent of the string number. The proposed algorithm can fast locate Pref（RE） and perform a search with the speed immune from Pref（RE） , and, particularly, when multiple parallel Bloom Filters are employed, the algorithm may indirectly lengthen the shortest path. Analysis and experimental results indicate that the proposed algorithm greatly accelerates the search of regular expressions, especially for the search of an regular expression set, and that the searching speed increases several times and even up to tens of times when Lmin and Pref（RE） values are both large. It is thus concluded that the proposed algorithm is suitable for the fast search of multiple regular expressions on a large scale.

作者徐克付齐德昱郑伟平钱正平

机构地区华南理工大学计算机系统结构研究所

出处《华南理工大学学报（自然科学版）》 EI CAS CSCD 北大核心 2009年第4期37-41,共5页 Journal of South China University of Technology(Natural Science Edition)

基金中国博士后自然科学基金资助项目(2005037582) 粤港关键领域重点突破项目(2005A10307007)

关键词正则表达式匹配 BLOOM Filter 自动机模式匹配 regular expression matching Bloom Filter automaton pattern matching

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献13

1Gonzalo Navarro,Mathieu Raffinot.New techniques for regular expression searching[J].Algorithmica,2005,11(41):89-116.
2Yu Fang,Chen Zhi-feng,Diao Yan-lei,et al.Fast and memory-efficient regular expression matching for deep packet inspetion[C]∥Proc of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems.San Jose:ACM/IEEE,2006:93-102.
3Thompson K.Regular expression search algorithm[J].Communications of the ACM,1968,11(6):419-422.
4Myers E.A four-Russian algorithm for regular expression pattern matching[J].Journal of the ACM,1992,39(2):430-448.
5Wu S,Manber U.Fast text searching allowing errors[J].Communications of the ACM,1992,35(10):83-91.
6Glushkov V M.The abstract theory of automata[J].Russian Mathematical Surveys,1961,16(5):1-53.
7Berry G,Sethi R.From regular expression to deterministic automata[J].Theoretical Computer Science,1986,48(1):117-126.
8Bruce W,Richard E.A Boyer-Moore-style algorithm for regular expression pattern matching[J].Science of Computer Programming,2003,8(48):99-117.
9Bruce W.A new regular grammar pattern matching algorithm[J].Theoretical Computer Science,2003,299(1/2/3):509-521.
10Navarro G,Raffinot M.Fast regular expression search[C]∥Proc of the 3rd Workshop on Algorithm Engineering.London:Springer Lecture Notes,1999:199-213.

二级参考文献12

1叶明江,崔勇,徐恪,吴建平.基于有状态Bloom filter引擎的高速分组检测[J].软件学报,2007,18(1):117-126. 被引量：13
2Yu F, Katz R H, Lakshman T V. Gigabit rate packet pattern-matching using TCAM [ C]//Proc of the 12th IEEE Int'l Conf on Network Protocols. Washington: IEEE, 2004 : 174-183.
3Sung Jung-Sik, Kang Eok-Min, Lee Youngseok, et al. A multi-gigabit rate deep packet inspection algorithm using TCAM [ C ]//Proc of Global Telecommunications Conference. St Louis : IEEE ,2006:62-66.
4Dharmapurikar S, Lockwood J. Fast and scalable pattern matching for content filtering [ C ] // Proc of the 2005 ACM Symposium on Architecture for Networking and Communications Systems. Princeton : ACM ,2005 : 183-192.
5Dharmapurikar S, Krishnamurthy P, Sponll T, et al. Deep packet inspection using parallel Bloom filters [ J ]. IEEE Micro,2004,24( 1 ) :52-61.
6Navarro Gonzalo, Raffinot Mathieu. Flexible pattern matching in strings:practical on-line search algorithms for texts and biological sequences [ M ]. Cambridge: Cambridge University Press ,2002.
7Sahinalp S C, Vishkin U. Efficient approximate and dynamic matching of patterns using a labeling paradigm [ C]//Proc of the 37th Conference on Foundations of Computer Science. Burlington : IEEE, 1996:320-328.
8Amir A, Farach M, Matias Y. Efficient randomized dictionary matching algorithms [ C] //Proc of the 3rd Symposium on Combinatorial Pattern Matching. Tucson: ACM, 1992:262-275.
9Fan L, Cao P, Almeida J, et al. Summary cache : a scalable wide-area Web cache sharing protocol [ J]. IEEE/ACM Transactions on Networking, 2000,8 ( 3 ) : 281 - 293.
10Zhen Chen-ehuang, Lin Chuang, Jia Ni, et al. AntiWorm NPU-based parallel Bloom filters for TCP/IP content processing in Giga Ethernet [ C ] //Proe of the First IEEE LCN Workshop on Network Security. Sydney: IEEE, 2005 : 748- 755.

共引文献1

1王景中,杜飞.矩阵型布鲁姆过滤器在病毒过滤防火墙中的研究[J].计算机应用,2009,29(11):2939-2941. 被引量：1

同被引文献27

1孙宏凯,王彦勋.中文数据排序与快速检索方法研究[J].微计算机信息,2007,23(3):255-257. 被引量：7
2周启海.NFA→FA→GFA自动机转换算法[J].电子科技大学学报,2005,34(3):363-365. 被引量：6
3王素琴,邹旭楷.一种有效的并行汉字／字符串相似检索技术[J].软件学报,1995,6(8):463-467. 被引量：2
4廖剑平,元昌安,邓松,饶元.一种基于Rough Set的汉语检索算法[J].广西师范学院学报（自然科学版）,2005,22(4):33-39. 被引量：1
5A. Broder, M. Mitzenmaeher. Network applications of bloom filters: A survey [J]. Internet Mathematics, 2005,1(4) :485-509.
6M. Mitzenmacher. Compressed Bloom Filters [J]. IEEE/ ACM Transactions on Networking, 2002,10 (5) : 604-612.
7Jeffrey E F.精通正则表达式[M].北京:电子工业出版社,2006:143-162.
8Skoudies E d.反击黑客[M].北京:机械工业出版社,2002:120-170.
9Adar Weidman.基于源代码分析保护应用程序的安全[EB/OL].http://www.checkmarx.com/NewsDetails.aspx?id=23&cat=3,2009-10-09.
10YuanM 袁真.构造正则表达式的几种NFA算法分析与比较.计算机科学,2006,33(8):212-214.

引证文献4

1罗理,刘响光,胡振,周姣,张刚伟,李启平.基于Bloom Filter的海量数据分布式快速匹配算法研究[J].计算机与数字工程,2011,39(3):44-47.
2梁兴开,赵泽茂,黄亮.Web应用中的ReDoS检测方法研究[J].杭州电子科技大学学报（自然科学版）,2011,31(5):75-78.
3邱冰.面向中文语料库的模式检索研究[J].微计算机信息,2012(7):3-5.
4成勤,肖稳安,王清龙,项建国,于乃莲,陈华.正则表达式在闪电定位资料处理中的应用[J].南京信息工程大学学报（自然科学版）,2019,11(1):121-126. 被引量：1

二级引证文献1

1樊荣,郑刚,植耀玲.基于分表的闪电定位系统数据库设计[J].计算机应用,2021,41(S02):136-138. 被引量：2

1屈正庚,赵杰.一种改进的高效多模式匹配算法[J].系统仿真技术,2014,10(2):116-120. 被引量：2
2谭征,孙红霞,王立宏,潘庆先.中文评教文本分类模型的研究[J].烟台大学学报（自然科学与工程版）,2012,25(2):122-126. 被引量：5
3权淑静.基于工作流模型的业务流程测试方法研究[J].北方工业大学学报,2015,27(3):57-61. 被引量：1
4小刚（文/图）.轻松抓取窗口内容[J].网友世界,2008(12):32-32.
5欧阳丹彤,李江娜,耿雪娜.离散事件系统故障的极小观测序列[J].湖南大学学报（自然科学版）,2016,43(4):147-152. 被引量：2
6范黎林,王晓东.一种用于垃圾邮件过滤的中文关键词匹配算法[J].河南科技大学学报（自然科学版）,2006,27(5):35-37. 被引量：6
7马莉.复杂背景下基于OCR的变体文本识别技术[J].科协论坛（下半月）,2008(12):76-77. 被引量：1
8白红.浅议Java多线程程序设计[J].计算机光盘软件与应用,2013,16(2):226-227.
9陈江兵,张巍.基于状态转换方法的不良信息文本过滤模型[J].江西教育学院学报,2005,26(6):22-24.
10兰景英,王永恒.基于UML状态图的测试场景生成法[J].计算机时代,2008(5):12-14. 被引量：1

华南理工大学学报（自然科学版）

2009年第4期

浏览历史

内容加载中请稍等...

一种基于Bloom Filter的正则表达式集合快速搜索算法被引量：4

参考文献13

二级参考文献12

共引文献1

同被引文献27

引证文献4

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于Bloom Filter的正则表达式集合快速搜索算法 被引量：4

参考文献13

二级参考文献12

共引文献1

同被引文献27

引证文献4

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于Bloom Filter的正则表达式集合快速搜索算法被引量：4