期刊文献+

串匹配算法中模式串与文本之间关系的研究 被引量:16

Research on Relationship Between Patterns and Text in String Matching Algorithms
下载PDF
导出
摘要 经典的串匹配算法设计和分析中假设"字符互相独立并且等概率出现",这与实际应用环境差异很大,导致出现很多问题.考虑了字符的概率分布和上下文的关联,同时兼顾应用的方便,提出了命中密度的概念.在给出基本定义和扩展定义后,通过对4种类型的代表性算法的理论和实验分析,给出了命中密度与算法性能之间的关系.同时,在对命中密度的分析中得出一些极具价值的结论.对命中密度概念的多角度理解以及对它与算法性能关系的深入剖析都说明,命中密度作为一个特征量,可以从一个侧面刻画模式串和文本之间的相关性,它对算法的设计和分析以及串匹配领域研究工作的扩展都具有指导意义. It was assumed that the pattern and text characters are independent and uniformly distributed over a finite alphabet in classical string matching algorithms, and this assumption differs from real applications and causes many problems. Considering the probability distributions, the contexts of the characters, and the convenience of applications, this paper gives a concept hit rate and four extended concepts about it. Then it gives the theory analysis and detailed experiments with hit rate on the four classical algorithms. The map of the relationships is obtained between the hit rate and the algorithms' performance, and at the same time some valuable conclusions are made through above work. As a character variable, hit rate describes the relativity of patterns and text and can serve as guidelines in the algorithms design, analysis and some other extended research fields of the string matching.
出处 《软件学报》 EI CSCD 北大核心 2010年第7期1503-1514,共12页 Journal of Software
基金 国家重点基础研究发展计划(973)No.2007CB311100~~
关键词 串匹配 字符概率分布 字符串相关性 string matching probability distributions of character relativity of strings
  • 相关文献

参考文献4

二级参考文献21

  • 1王素琴,邹旭楷.一种优化的并行汉字/字符串匹配算法[J].中文信息学报,1995,9(1):49-53. 被引量:4
  • 2[1]RS Boyer, J S Moore. A fast string searching algorithm.Communications of ACM, 1977, 20(10): 762~772
  • 3[2]A Aho, M Corasick. Efficient string matching: An aid to biliographic search. Communications of ACM, 1975, 18(6): 333~ 340
  • 4[3]B Commentz-Walter. A string matching algorithm fast on average.In: H A Maurer ed. Proc of the 6th Int'l Colloquium on Automata, Languages, and Programming, LNCS 71. Berlin:Springer, 1979. 118~132
  • 5[5]E Ukkonen. On-line construction of suffix trees. Algorithmica,1995, 14(3): 249~260
  • 6[6]Bruce W Watson. The performance of single-keyword and multiple-keyword pattern matching algorithms. Eindhoven University of Technology, Eindhoven, the Netherlands, Tech Rep: 94/19, 1994
  • 7Boyer RS, Moore JS. A fast string searching algorithm[ M]. Communications of the ACM20, 1977. 762- 772.
  • 8Sun W, Manber U. A Fast Algorithm For Multi-pattern Searching[ D]. The Computer Science Department of The University of Arizona, 1994.
  • 9Sun W, Manber U. Agrep-A Fast Approximate Pattem-matching Tool[M]. Usenix Winter Technical Conference, 1992.
  • 10Kim S. A Fast Multiple String - Pattern Matching Algorithm [ A ] .17th AoM/IAoM International Conference on Computer Science[ C].San Diego CA, August 1999.

共引文献54

同被引文献136

引证文献16

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部