巨型多不确定串匹配完全自动机及其快速生成算法被引量：2

Giant complete automaton for uncertain multiple string matching and its high speed construction algorithm

导出

摘要在串匹配搜索中,字符串常常采用U-不确定串、V-不确定串及其结合的U-V-不确定串.如何识别巨量U-不确定字符串、V-不确定字符串和U-V-不确定字符串,以及两个和两个以上U-V-不确定字符串的交错情况的串匹配,是没有遗漏地检测有害信息的关键问题.本文提出一个快速检测巨量U-不确定字符串、巨量V-不确定字符串和巨量U-V-不确定字符串的多串匹配完全自动机及其快速生成方法,包括两个和两个以上不确定字符串相互交错的情况;并且给出V-不确定字符串的完全自动机的最大并行台数,指出通常正则表达式匹配可能出现相似连接和交错情况的两种遗漏,指出如果没有从整体的角度对U-不确定串中的字符子串集进行两两不相交化及无同源后续奇点化的处理,结果就可能出现错误或者增加状态数目. Multiple string matching is often completed under the presence of Uor V-uncertain-strings, or a combination of the two. Recognizing large numbers of strings with U-, V-, and U-V-uncertain-strings, including the interleaving of two or more uncertain strings, is the key to the successful detection of harmful information. This paper proposes a complete automaton and its high speed construction algorithm to detect large-scale U-, V-, and U-V-uncertain multiple strings, including two or more uncertain strings interlaced with one another. The maximum of the parallel complete automaton of the V-uncertain string is also reported. Finally, this study reveals that two kinds of pretermissions, a similarly connected pretermission and interlaced string pretermission, may appear in the matching of the regular expressions. The result of this maybe mistake or the number of states in the automaton may be increased, if the intersection of the U-uncertain strings sets, and the ＂homologous subsequent especial point＂ in the U-uncertain strings sets, never be eliminated from whole system.

作者胡玥高庆狮郭莉王培凤

机构地区北京科技大学信息工程学院中国科学院计算技术研究所

出处《中国科学：信息科学》 CSCD 2011年第5期552-561,共10页 Scientia Sinica(Informationis)

基金国家自然科学基金(批准号:60873002) 国家重点基础研究发展计划(批准号:2007CB311100)资助项目

关键词多串匹配 U-不确定串 V-不确定串 U-V-不确定串完全自动机 multiple string matching U-uncertain-strings V-uncertain-strings U-V-uncertain-strings complete automaton

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献4

1余建明,薛一波,李军.Memory Efficient String Matching Algorithm for Network Intrusion Management System[J].Tsinghua Science and Technology,2007,12(5):585-593. 被引量：9
2贺龙涛,方滨兴,余翔湛.一种时间复杂度最优的精确串匹配算法[J].软件学报,2005,16(5):676-683. 被引量：25
3宋华,戴一奇.一种用于内容过滤和检测的快速多关键词识别算法[J].计算机研究与发展,2004,41(6):940-945. 被引量：22
4刘萍,谭建龙.XML内容筛选中的快速串匹配算法[J].中文信息学报,2005,19(2):20-27. 被引量：3

二级参考文献48

1[1]RS Boyer, J S Moore. A fast string searching algorithm.Communications of ACM, 1977, 20(10): 762～772
2[2]A Aho, M Corasick. Efficient string matching: An aid to biliographic search. Communications of ACM, 1975, 18(6): 333～ 340
3[3]B Commentz-Walter. A string matching algorithm fast on average.In: H A Maurer ed. Proc of the 6th Int'l Colloquium on Automata, Languages, and Programming, LNCS 71. Berlin:Springer, 1979. 118～132
4[5]E Ukkonen. On-line construction of suffix trees. Algorithmica,1995, 14(3): 249～260
5[6]Bruce W Watson. The performance of single-keyword and multiple-keyword pattern matching algorithms. Eindhoven University of Technology, Eindhoven, the Netherlands, Tech Rep: 94/19, 1994
6Gonzalo Navarro and Mathieu Raffinot, Flexible Pattern Matching in Strings[ M ]: Practical on-line search algorithms for texts and biological sequences, Cambridge University Press, 2002, ISBN 0 - 521 - 81307 - 7.
7D.E.Knuth,J.H.Morris,V.R.Pratt, Fast Pattern Matching in Strings[J]. SIAM Journal on Computing,1977,323 -350.
8A.V. Aho and M. J. Corasick, Efficient string matching: an aid to bibliographic search[ J], Communication of the ACM, 1975,18(6) :333 - 340.
9S. Wu, U. Manber, Fast text searching allowing errors[J], Communications of the ACM, 1992,35(10) :83 - 91.
10R.S.Boyer, J.S.Moore, A fast string searching algorithm [J], Communications of the ACM, 1977,20(10) :762 -772.

共引文献51

1彭昱忠,元昌安,王艳,覃晓.基于内容理解的不良信息过滤技术研究[J].计算机应用研究,2009,26(2):433-438. 被引量：19
2Jin Shu(1),Liu Fengyu(2)(1.NAEG System Integration Engineering Co.Ltd,Nanjing,210003,P.R.China,2.Nanjing University of Science & Technology,Computer Science Department,210094,P.R.China).A Parallel String Searching Algorithm for Information Filtering[J].工程科学（英文版）,2007,5(3):82-90.
3王成江,冉兵,戴迪,吴磊.基于滑动窗口的动态手写签名局部相关性研究[J].三峡大学学报（自然科学版）,2006,28(2):157-160.
4陈曙晖,苏金树.基于两级审计的分布式内容审计系统[J].计算机工程与科学,2006,28(6):1-3.
5黄栋,余综.模式匹配算法在FPGA芯片上的设计与实现[J].计算机工程与设计,2006,27(17):3273-3276. 被引量：1
6刘传汉,王永成,刘德荣,李党林.基于混合策略的单模式匹配算法[J].上海交通大学学报,2007,41(1):36-41. 被引量：3
7何申,罗文坚,王煦法.一种检测器长度可变的非选择算法[J].软件学报,2007,18(6):1361-1368. 被引量：24
8申晋祥,杨秋翔.模式匹配算法的研究与改进[J].电脑开发与应用,2007,20(7):9-10.
9许秀林,胡克瑾.基于组合策略的单模式串精确匹配算法[J].计算机应用,2008,28(1):232-235. 被引量：1
10巩宁平,高太平.一种基于编译技术的可信赖计算方法的设计与实现[J].计算机应用与软件,2008,25(1):46-48. 被引量：2

同被引文献21

1贺龙涛,方滨兴,余翔湛.一种时间复杂度最优的精确串匹配算法[J].软件学报,2005,16(5):676-683. 被引量：25
2上下文无关语言的数学理论[M].陈力行,译.济南:山东大学出版社.1986:52-63
3蒋宗礼,姜守旭.形式语言与自动机理论(第2版)[M].北京:清华大学出版社,2007.
4Navarro G, Raffinot M. Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge.. Cambridge University Press, 2002.
5Aho A V, Corasick M J. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975, 18(6) : 333-340.
6Coit C J, Staniford S, MeAlerney J. Towards faster string matching for intrusion detection or exceeding the speed of snort//Proceedings of the DARPA Information Survivability Conference& Exposition II. Anaheim, USA, 2001:367-373.
7Wu Sun, Manber U. A fast algorithm for multi-pattern searching. University of Arizona, Tucson: Technical Report TR-94-17, 1994.
8Song Tian, Wang Dongsheng. A path combinational method for multiple pattern matching//Proceedings of the 5th ACM/ IEEE Symposium on Architectures for Networking and Communications Systems. Princeton, USA, 2009:76-77.
9Piyaehon P, Luo Yan. Efficient memory utilization on network processors for deep packet inspection//Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems. San Jose, USA, 2006:71-80.
10Dharmapurikar S, Lockwood J. Fast and scalable pattern matching for content filtering//Proceedings of the Symposium on Architectures for Networking and Communications. Princeton, USA, 2005:183-192.

引证文献2

1杨天龙,张宏莉.一种适合于超大规模特征集的匹配方法[J].计算机学报,2014,37(5):1147-1158. 被引量：2
2韩光辉,曾诚.关于有限自动机定义的一个注记[J].电脑与信息技术,2015,23(4):1-4. 被引量：1

二级引证文献3

1孔德婧,董放,李昭甫,屈贤明.基于文本挖掘的“中国制造2025”地区行动计划分析[J].中国工程科学,2017,19(3):149-158.
2鲍亮,俞少华,唐晓婷.基于马尔可夫链的Web业务安全分析预警[J].信息网络安全,2021(8):91-96.
3焦文欢,冯兴杰.一种改进的字符串匹配模型研究[J].计算机仿真,2022,39(3):319-324. 被引量：1

1王培凤,李莉.一种改进的多模式匹配算法在Snort中的应用[J].计算机科学,2012,39(2):72-74. 被引量：8
2李国,蒿培培.基于Android平台的入侵检测系统的研究与实现[J].信息网络安全,2013(2):27-29. 被引量：3
3巩文化,毕学军,刘娟.基于数据库的快速测试数据生成方法研究[J].电脑知识与技术,2010(2):775-777. 被引量：3
4华纯,肖铁军.基于FPGA的高斯建模运动目标检测算法[J].计算机工程与设计,2011,32(9):3000-3003. 被引量：3
5刁师言.好设计点化好科技——2012德国红点设计奖至尊大奖品鉴[J].数码精品世界,2012(8):156-163.
6陈小军,张志斌,刘燕兵,郭莉.大规模多串匹配算法的访存行为分析[J].计算机工程与应用,2007,43(26):106-109. 被引量：1
7张吉,谭建龙,郭莉.基于包内容的未知蠕虫发现[J].计算机工程,2006,32(8):178-180. 被引量：1
8孙德才,王晓霞.一种基于MapReduce的大数据集相似自连接算法[J].计算机科学,2017,44(5):20-25. 被引量：3
9李斌.NURBS曲线的一种快速生成方法[J].安庆师范学院学报（自然科学版）,2011,17(1):31-34. 被引量：1
10许都,李乐民.自相似业务流的快速生成方法及其性能研究[J].通信学报,1998,19(8):89-95. 被引量：6

中国科学：信息科学

2011年第5期

浏览历史

内容加载中请稍等...

巨型多不确定串匹配完全自动机及其快速生成算法被引量：2

参考文献4

二级参考文献48

共引文献51

同被引文献21

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

巨型多不确定串匹配完全自动机及其快速生成算法 被引量：2

参考文献4

二级参考文献48

共引文献51

同被引文献21

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

巨型多不确定串匹配完全自动机及其快速生成算法被引量：2