期刊文献+

带任意长度通配符的模式匹配 被引量:5

Pattern Matching with Arbitrary-length Wildcards
下载PDF
导出
摘要 基因序列中,许多病毒并不是简单的直接复制自己,而是相邻字符间插入或者删除序列片段,如何从序列数据中检索这些病毒具有重要的研究价值.提出了一个更普遍的问题,带任意长度通配符的模式匹配问题(Pattern matching with arbitrary-length wildcards,PMAW),这里模式中不仅可以有多个通配符约束,而且每个通配符的约束可以是两个整数,也可以从整数到无穷大.给定序列S和带通配符的模式P,目标是从S中检索P的所有出现和每一次出现的匹配位置,并且要求任意两次出现不能共享序列中同一位置.为了有效地解决该问题,设计了两个基于位并行的匹配算法MOTW(Method of ocurrence then window)算法和MWTO(Method of window then ocurrence)算法.同时,MWTO算法进行细微改动就可以满足全局长度约束.实验结果既验证了算法求解问题的正确性,又验证了比相关的模式匹配算法具有更好的时间性能. In genetic sequences, many viruses rarely reproduce themselves, but rather appear with a slightly different form in each of the occurrences. That is, sequence fragments may be inserted or deleted in adjacent characters. How to search for these viruses from the sequences has become an important research task. The paper presents a more general problem, named pattern matching with arbitrary-length wildcards (PMAW). Here, a pattern can have many wildcard constraints where the range of the wildcards may vary between two integer bounds or from an integer lower bound to infinity. Given sequence S and pattern P with arbitrary-length wildcards, this paper aims to search for all occurrences of P in S, and locate matching positions of each occurrence, where any two occurrences can not share the same position of S. In order to solve the problem effectively, two algorithms, MOTW (Method of ocurrence then window) and MWTO (Method of window then ocurrence), are proposed based on the bit-parallel technique. The MWTO algorithm can also meet the global length constraint with a minor modification. Experimental results validate the correctness of the proposed algorithms, and show that they perform better than the existing pattern matching algorithms.
出处 《自动化学报》 EI CSCD 北大核心 2014年第11期2499-2511,共13页 Acta Automatica Sinica
基金 国家自然科学基金(61229301) 国家重点基础研究发展计划(973计划)(2013CB329604) 教育部长江学者和创新团队发展计划(IRT13059) 中国博士后科学基金(2013M541822)资助~~
关键词 通配符 模式匹配 位并行 基因序列 Wildcard, pattern matching, bit-parallel, genetic sequence
  • 相关文献

参考文献33

  • 1Pisanti N, Crochemore M, Grossi R, Sagot M F. Bases of motifs for generating repeated patterns with wildcards. IEEE/ACM Transactions on Computational Biology Bioinformatics, 2005, 2(1): 40-50.
  • 2岳峰,孙亮,王宽全,王永吉,左旺孟.基因表达数据的聚类分析研究进展[J].自动化学报,2008,34(2):113-120. 被引量:25
  • 3Rahman M S, Iliopoulos C S, Lee I, Mohamed M, Smyth W F. Finding patterns with variable length gaps or don't cares. Computing and Combinatorics. Berlin, Heidelberg: Springer, 2006. 146-155.
  • 4Ding B L, Lo D, Han J W, Khoo S C. Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of the 25th International Conference on Data Engineering. Shanghai, China: IEEE, 2009. 1024-1035.
  • 5Wu X D, Zhu X Q, He Y, Arslan A N. PMBC: Pattern mining from biological sequences with wildcard constraints. Computers in Biology and Medicine, 2013, 43(5): 481-492.
  • 6王宝勋,刘秉权,孙承杰,王晓龙,孙林.基于论坛话题段落划分的答案识别[J].自动化学报,2013,39(1):11-20. 被引量:7
  • 7潘云鹤,王金龙,徐从富.数据流频繁模式挖掘研究进展[J].自动化学报,2006,32(4):594-602. 被引量:34
  • 8Tanbeer S K, Ahmed C F, Jeong B S, Lee Y K. Efficient frequent pattern mining over data streams. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. California, USA: ACM, 2008. 1447-1448.
  • 9Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403-410.
  • 10Fischer M J, Paterson M S. String-matching and other products. Proceeding of the 7th SIAM AMS Complexity of Computation. Cambridge, USA, 1974. 113-125.

二级参考文献79

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269. 被引量:19
  • 3[1]Brown P O,Botstein D.Exploring the new world of the genome with DNA microarrays.Nature Genetics,1999,21(1):33-37
  • 4[2]Jain A K,Murty M N,Flynn P J.Data clustering:a review.ACM Computing Surveys,1999,31(3):264-323
  • 5[3]Schena M,Shalon D,Davis R W,Brown P O.Quantitative monitoring of gene expression patterns with a complementary DNA microarray.Science,1999,270(5235):467-470
  • 6[4]Schena M,Scalon D,Heller R.Parallel human genome analysis:microarray-based expression monitoring of 1000 genes.Proceedings of the National Academy of Sciences of the United States of America,1996,93(20):10614-10619
  • 7[5]Ramsay G.DNA chips:state-of-the art.Nature Biotechnology,1998,16(1):40-44
  • 8[6]Lockhart D J,Dong H,Byrne M C,Follettie M T,Gallo M V,Chee M S.Expression monitoring by hybridization to high-density oligonucleotide arrays.Nature Biotechnology,1996,14(13):1675-1680
  • 9[7]Lipshutz R J,Fodor S P,Gingeras T R,Lockhart D J.High density synthetic oligonucleotide arrays.Nature Genetics,1999,21(1):20-24
  • 10[8]Harrington C A,Rosenow C,Retief J.Monitoring gene expression using DNA microarrays.Current Opinion in Microbiology,2000,3(3):285-291

共引文献103

同被引文献35

  • 1李志秀,张军,陈光,杨丽红.JQuery Ajax异步处理JSON数据在项目管理系统中的应用[J].云南大学学报(自然科学版),2011,33(S2):247-250. 被引量:29
  • 2张译,刘衍珩,田大新,李川川,王媛.基于关联规则的入侵检测系统[J].吉林大学学报(信息科学版),2006,24(2):204-209. 被引量:11
  • 3张涛,黄强,毛磊雅,高兴.一个基于JSON的对象序列化算法[J].计算机工程与应用,2007,43(15):98-100. 被引量:57
  • 4Fischer M J,Paterson M S.String matching and other products[R].Cambridge,MA:Massachusetts Institute of Technology ,1974.
  • 5Manber U,Baeza-Yates R.An Algorithm for String Matching with a Sequence of Don’t cares[J].Information Processing Letters,1991,37(3):133-136.
  • 6Califf M E,Mooney R J.Bottom-up Relational Learning of Pattern Matching Rules for Information Extraction[J].Journal of Machine Learning Research,2003,4(6):177-210.
  • 7Guo Dan,Hu Xue-gang,Xie Fei,et al.Pattern Matching withWildcards and Gap-length Constraints Based on a Centrality-degree Graph[J].Applied Intelligence,2013,39:57-74.
  • 8Dahiya A,Garg D.Maximal pattern matching with flexible wildcard gaps and one-off constraint[C]∥International Conference on Advances in Computing,Communications and Informatics.Arras,France,2014:1107-1112.
  • 9Chen Gong,Wu Xin-dong,Zhu Xing-quan,et al.Efficient string matching with wildcards and length constraints[J].Knowledge and Information Systems,2006,10(4):399-419.
  • 10Min Fan,Wu Xin-dong,Lu Zhen-yu.Pattern matching with independent wildcard gaps[C]∥Proceedings of the 8th IEEE International Conference on Dependable,Autonomic and Secure Computing.2009:194-199.

引证文献5

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部