期刊文献+

巨型多不确定串匹配完全自动机及其快速生成算法 被引量:2

Giant complete automaton for uncertain multiple string matching and its high speed construction algorithm
原文传递
导出
摘要 在串匹配搜索中,字符串常常采用U-不确定串、V-不确定串及其结合的U-V-不确定串.如何识别巨量U-不确定字符串、V-不确定字符串和U-V-不确定字符串,以及两个和两个以上U-V-不确定字符串的交错情况的串匹配,是没有遗漏地检测有害信息的关键问题.本文提出一个快速检测巨量U-不确定字符串、巨量V-不确定字符串和巨量U-V-不确定字符串的多串匹配完全自动机及其快速生成方法,包括两个和两个以上不确定字符串相互交错的情况;并且给出V-不确定字符串的完全自动机的最大并行台数,指出通常正则表达式匹配可能出现相似连接和交错情况的两种遗漏,指出如果没有从整体的角度对U-不确定串中的字符子串集进行两两不相交化及无同源后续奇点化的处理,结果就可能出现错误或者增加状态数目. Multiple string matching is often completed under the presence of Uor V-uncertain-strings, or a combination of the two. Recognizing large numbers of strings with U-, V-, and U-V-uncertain-strings, including the interleaving of two or more uncertain strings, is the key to the successful detection of harmful information. This paper proposes a complete automaton and its high speed construction algorithm to detect large-scale U-, V-, and U-V-uncertain multiple strings, including two or more uncertain strings interlaced with one another. The maximum of the parallel complete automaton of the V-uncertain string is also reported. Finally, this study reveals that two kinds of pretermissions, a similarly connected pretermission and interlaced string pretermission, may appear in the matching of the regular expressions. The result of this maybe mistake or the number of states in the automaton may be increased, if the intersection of the U-uncertain strings sets, and the "homologous subsequent especial point" in the U-uncertain strings sets, never be eliminated from whole system.
出处 《中国科学:信息科学》 CSCD 2011年第5期552-561,共10页 Scientia Sinica(Informationis)
基金 国家自然科学基金(批准号:60873002) 国家重点基础研究发展计划(批准号:2007CB311100)资助项目
关键词 多串匹配 U-不确定串 V-不确定串 U-V-不确定串 完全自动机 multiple string matching U-uncertain-strings V-uncertain-strings U-V-uncertain-strings complete automaton
  • 相关文献

参考文献4

二级参考文献48

  • 1[1]RS Boyer, J S Moore. A fast string searching algorithm.Communications of ACM, 1977, 20(10): 762~772
  • 2[2]A Aho, M Corasick. Efficient string matching: An aid to biliographic search. Communications of ACM, 1975, 18(6): 333~ 340
  • 3[3]B Commentz-Walter. A string matching algorithm fast on average.In: H A Maurer ed. Proc of the 6th Int'l Colloquium on Automata, Languages, and Programming, LNCS 71. Berlin:Springer, 1979. 118~132
  • 4[5]E Ukkonen. On-line construction of suffix trees. Algorithmica,1995, 14(3): 249~260
  • 5[6]Bruce W Watson. The performance of single-keyword and multiple-keyword pattern matching algorithms. Eindhoven University of Technology, Eindhoven, the Netherlands, Tech Rep: 94/19, 1994
  • 6Gonzalo Navarro and Mathieu Raffinot, Flexible Pattern Matching in Strings[ M ]: Practical on-line search algorithms for texts and biological sequences, Cambridge University Press, 2002, ISBN 0 - 521 - 81307 - 7.
  • 7D.E.Knuth,J.H.Morris,V.R.Pratt, Fast Pattern Matching in Strings[J]. SIAM Journal on Computing,1977,323 -350.
  • 8A.V. Aho and M. J. Corasick, Efficient string matching: an aid to bibliographic search[ J], Communication of the ACM, 1975,18(6) :333 - 340.
  • 9S. Wu, U. Manber, Fast text searching allowing errors[J], Communications of the ACM, 1992,35(10) :83 - 91.
  • 10R.S.Boyer, J.S.Moore, A fast string searching algorithm [J], Communications of the ACM, 1977,20(10) :762 -772.

共引文献51

同被引文献21

  • 1贺龙涛,方滨兴,余翔湛.一种时间复杂度最优的精确串匹配算法[J].软件学报,2005,16(5):676-683. 被引量:25
  • 2上下文无关语言的数学理论[M].陈力行,译.济南:山东大学出版社.1986:52-63
  • 3蒋宗礼,姜守旭.形式语言与自动机理论(第2版)[M].北京:清华大学出版社,2007.
  • 4Navarro G, Raffinot M. Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge.. Cambridge University Press, 2002.
  • 5Aho A V, Corasick M J. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975, 18(6) : 333-340.
  • 6Coit C J, Staniford S, MeAlerney J. Towards faster string matching for intrusion detection or exceeding the speed of snort//Proceedings of the DARPA Information Survivability Conference& Exposition II. Anaheim, USA, 2001:367-373.
  • 7Wu Sun, Manber U. A fast algorithm for multi-pattern searching. University of Arizona, Tucson: Technical Report TR-94-17, 1994.
  • 8Song Tian, Wang Dongsheng. A path combinational method for multiple pattern matching//Proceedings of the 5th ACM/ IEEE Symposium on Architectures for Networking and Communications Systems. Princeton, USA, 2009:76-77.
  • 9Piyaehon P, Luo Yan. Efficient memory utilization on network processors for deep packet inspection//Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems. San Jose, USA, 2006:71-80.
  • 10Dharmapurikar S, Lockwood J. Fast and scalable pattern matching for content filtering//Proceedings of the Symposium on Architectures for Networking and Communications. Princeton, USA, 2005:183-192.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部