期刊文献+

基于后缀树的带有通配符的模式匹配研究 被引量:7

Pattern Matching with Wildcards Based on Suffix Tree
下载PDF
导出
摘要 由于在生物序列分析、文本索引、网络入侵检测等领域的应用需求,带有通配符的模式匹配问题一直是研究的热点。针对已有的研究工作中通配符和长度约束具有较强的局限性问题,研究带有灵活通配符的模式匹配问题,其中通配符可以在模式的任意两子串间出现且可以指定灵活的长度约束。采用非线性数据结构——后缀树,设计了求解模式所有解的完备算法PAST。预处理阶段采用在线增量式算法构建具有文本先验知识的后缀树,搜索阶段结合动态规划的思想,逐个匹配模式中字符,最终得到完备解。在基因序列上的实验表明,PAST比其他算法具有更好的时间性能。 Pattern matching with wildcards is a hot research problem that can be used in biological sequence analysis,text indexing,network intrusion detection,and so on.Aiming at the problem that the wildcards have strong limitations in the existing research work,pattern matching with flexible wildcards was studied.The wildcards can appear between any two substrings and can be specified with flexible length constraints.The nonlinear data structure—suffix tree was used to design a completeness algorithm PAST.In the prepare process,an online incremental algorithm was used to build the suffix tree which has priori knowledge of the text.In the search phase,the idea of dynamic programming was used to match the characters of the pattern.Experiments on DNA sequences show that our method has better perfor-mances in time than the related matching algorithm.
出处 《计算机科学》 CSCD 北大核心 2012年第12期177-180,194,共5页 Computer Science
基金 国家"863"计划课题(2012AA011005) 国家博士后科学基金(2012M511403) 安徽省自然科学基金(11040606M134) 中央高校基本科研基金(2010HGXJ0714)资助
关键词 模式匹配 通配符 后缀树 Pattern matching Wildcards Suffix tree
  • 相关文献

参考文献17

  • 1Altschul S F, Gish W, Miller W, et al. Basic local alignment search tool [J]. Journal of Molecular Biology, 1990, 215 (3): 403-410.
  • 2Fischer M J,Paterson M S. String matching and other products [J]. Complexity of computation, Massachusetts Institute of Technology, 1974,7:113-125.
  • 3Don A, ZhelevaE, GregoryM, et al. Discovering interesting usa- ge pawterns in text collections: integrating text mining with visualization[A]//Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM'07)[C]. New York:ACM, 2007 : 213- 222.
  • 4Yang Jiong, Wang Wei, Yu P S. Mining asynchronous periodic patterns in time series data [J]. IEEE Transactions on Know- ledge and Data Engineering, 2003,15 (3) : 613-628.
  • 5Tanbeer S K, Ahmed C F, Jeong B-S, et al. Efficient frequent pattern mining over data streams[A]//Proceeding of the 17th ACM Conference on Information and Knowledge Management (CIKM'08)[C]. Califomia:ACM, 2008:1447-1448.
  • 6Cole R, Gottlieb LA, Lewenstein M. Dictionary matching and in- dexing with errors and don' t cares [ A]//Proceedings of the 36th ACM Symposium on the Theory of Computing[C]. New York: ACM, 2004:91-100.
  • 7Manber U, Baeza-Yates R. An algorithm for string matching with a sequence of don't cares [J]. Information Processing Let- ters, 1991,37(3) : 133-136.
  • 8Min Fan, Wu Xiwdong, Lu Zhen-yu. Pattern Matching with In- dependent Wildcard Gaps[A]//2009 Eighth IEEE International Conference on Dependable[C]. 2009 : 194 199.
  • 9霍红卫,王小武.DNA序列中基于适应性后缀树的重复体识别算法[J].计算机学报,2010,33(4):747-754. 被引量:4
  • 10Ukkonen E. On-line Construction of Suffix Trees [J]. Algorith- mica, 1995,1 (14) : 249-260.

二级参考文献31

  • 1Lander E S, Linton L M, Birren Bet al, Initial sequencing and analysis of the human genome. Nature, 2001, 409 (6822) : 860-921.
  • 2Saha Surya, Bridges Susan, Magbanua Zenaida V, Peterson Daniel G. Empirical comparison of ab initio repeat finding programs. Nucleic Acids Research, 2008, 36(7) : 2284-2294.
  • 3Lefebvre A, Leeroq T, Dauchel H, Alexandre J. FORRepeats: Detects repeats on entire chromosomes and between genomes. Bioinformatics, 2003, 19(3): 319-326.
  • 4Jones Nell C, Pevzner Pavel A. Introduction to Bioinformatics Algorithms. Cambridge, Massachusetts: MIT Press, 2004.
  • 5Huntington's Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded an unstable on Huntington's disease chromosomes. Cell, 1993, 72(6), 971-983.
  • 6Bergman Casey M, Quesneville Hadi. Discovering and detecting transposable elements in genome sequences. Briefings in Bioinformatics, 2007, 8(6) : 382-392.
  • 7Pevzner P A, Tang H, Tesler G. De novo repeat classification and fragment assembly. Genome Research, 2004, 14 (9): 1786-1796.
  • 8Kurtz S, Schleiermacher C. REPuter: Fast computation of maximal repeats in complete genomes. Bioinformatics, 1999, 15(5): 426-427.
  • 9Price A L, Jones N C, Pevzner P A. De novo identification of repeat families in large genomes. Bioinformatics, 2005, 21 (Supplement) : i351-i358.
  • 10Edgar R, Myers E. Piler: Identification and classification of genomic repeats. Bioinformatics, 2005, 21 (Supplement) : i152-i158.

共引文献23

同被引文献98

引证文献7

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部