期刊文献+

XML内容筛选中的快速串匹配算法 被引量:3

A Fast String Matching Algorithm in Content-based XML Filtering
下载PDF
导出
摘要 本文提出了一种对XML文本进行快速串匹配的算法 -XMatch。在对于XML文本的含路径信息的模式串匹配中 ,由于XML文本的结构化特点 ,使得传统的串匹配算法不能直接有效的使用 ;而现有的大部分XML内容筛选方法都是基于SAX分析的事件驱动过程 ,效率普遍较低。XMatch在对XML文本的结构 -schema进行分析的同时 ,结合模式串的路径信息 ,建立一个扫描自动机的有限状态自动机 ;此外 ,算法还支持带循环引用路径信息的模式串匹配。XMatch容易扩展 ,可以支持普通的结构化文本的串匹配。实验结果显示 ,本算法的效率比使用SAX事件驱动的方法有明显的提高。 We propose an algorithm to do fast string match of XML files XMatch. In the pattern string matching of XML files which contain path information, traditional string match algorithms cant be effectively directly used due to the structured characteristics of XML files; Most of the available methods of XML content filtering are based on SAX event driven which is not very efficient. When analyzing schema the structure of XML files, XMatch utilizes the path information of pattern string to construct a DFA ; In addition, the algorithm support pattern matching with loop reference path information. XMatch is scalable and can support string matching of common structure text. Experiment results show that, the efficiency is distinctly improved compared with using the method of SAX event driven.
作者 刘萍 谭建龙
出处 《中文信息学报》 CSCD 北大核心 2005年第2期20-27,共8页 Journal of Chinese Information Processing
基金 国家"8 6 3"计划资助项目 (2 0 0 2AA14 2 110 )
关键词 计算机应用 中文信息处理 XML数据处理 串匹配 多关键词匹配 computer application Chinese information processing XML data processing string matching multiple keyword matching
  • 相关文献

参考文献17

  • 1Gonzalo Navarro and Mathieu Raffinot, Flexible Pattern Matching in Strings[ M ]: Practical on-line search algorithms for texts and biological sequences, Cambridge University Press, 2002, ISBN 0 - 521 - 81307 - 7.
  • 2D.E.Knuth,J.H.Morris,V.R.Pratt, Fast Pattern Matching in Strings[J]. SIAM Journal on Computing,1977,323 -350.
  • 3A.V. Aho and M. J. Corasick, Efficient string matching: an aid to bibliographic search[ J], Communication of the ACM, 1975,18(6) :333 - 340.
  • 4S. Wu, U. Manber, Fast text searching allowing errors[J], Communications of the ACM, 1992,35(10) :83 - 91.
  • 5R.S.Boyer, J.S.Moore, A fast string searching algorithm [J], Communications of the ACM, 1977,20(10) :762 -772.
  • 6B. Commentz-Walter, A string matching algorithm fast on the average[ A], In: Proceeding s of the 6th International Colloquium on Automata, Language and Programming , number 71 in Lecture Notes in Computer Science[C], 1979,118- 132.
  • 7S.Wu, U.Manber, A fast algorithm for multi-pattern searching[R] , Report TR- 94- 17 , Department of Computer Science , University of Arizona, Tucson, AZ , 1994.
  • 8M.Crochemore, A.Czumaj, L. Gasienniec, S.Jarominek, T. Lecroq, W. Plandowski, W. Rytter, Speeding up two string matching algorithms[ J], Algorithmica, 1994,12(4/5): 247 - 267.
  • 9C. Allauzen, M. Crochemore, M. Raffinot, Efficient experimental string matching by weak factor recognition[ A], In:proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, number 2089 in Lecture Notes in Computer Science[ C], pages51 - 72. Springer-Verlag, 2001.
  • 10A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, R. McConnel. Complete inverted files for efficient text retrieval and analysis[J]. Jonual of the ACM, 1987,34(3):578- 595.

同被引文献23

  • 1宋华,戴一奇.一种用于内容过滤和检测的快速多关键词识别算法[J].计算机研究与发展,2004,41(6):940-945. 被引量:22
  • 2贺龙涛,方滨兴,余翔湛.一种时间复杂度最优的精确串匹配算法[J].软件学报,2005,16(5):676-683. 被引量:25
  • 3Roesh M. Sno, t hghtweight intrusion detection for Networks. In: Proceedings of the 13th Systems Administration Conference, USENIX, Seattle, Washington, USA, 1999. 229-238.
  • 4ClamAV. Clam AntiVirus. http://www. clamav. net: ClamAV, 2002.
  • 5Fisk M, Varghese G. An analysis of fast string matching applied to content-based forwarding and intrusion detection: [technical report CS2001-0670]. San Diego: University of California-San Diego, 2002.
  • 6Jari K, Leena S, Jorma T. Tuning string matching for huge pattern sets. In: Proceedings of the 14th Annual Combinatorial Pattern Matching (CPM) Symposium, Morelia, Mexico,2003. 211-224.
  • 7Aho A V, Corasick M J. Efficient string matching: an aid to bibliographic search. Commuaications of the ACM, 1975,18 (6) : 333-340.
  • 8Coit C J, Stanfford S, McAlemey J. Towards faster string matching for intrusion detection or exceeding the speed of Snort. In: Proceedings of the DARPA Information Survivability Conference and Exposition Ⅱ (DISCEX'01), Los Alamitos, CA, USA, 2001. 367-373.
  • 9Boyer R, Moore J. A fast string searching algorithm. Communications of the ACM, 1977,20(10) :762-772.
  • 10Wu S, Manber U. A fast algorithm for multi-pattern searchirg: [technical report TR-94-17]. Tucson: University of Arizona, 1994.

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部