基因序列中,许多病毒并不是简单的直接复制自己,而是相邻字符间插入或者删除序列片段,如何从序列数据中检索这些病毒具有重要的研究价值.提出了一个更普遍的问题,带任意长度通配符的模式匹配问题(Pattern matching with arbitrary-lengt...基因序列中,许多病毒并不是简单的直接复制自己,而是相邻字符间插入或者删除序列片段,如何从序列数据中检索这些病毒具有重要的研究价值.提出了一个更普遍的问题,带任意长度通配符的模式匹配问题(Pattern matching with arbitrary-length wildcards,PMAW),这里模式中不仅可以有多个通配符约束,而且每个通配符的约束可以是两个整数,也可以从整数到无穷大.给定序列S和带通配符的模式P,目标是从S中检索P的所有出现和每一次出现的匹配位置,并且要求任意两次出现不能共享序列中同一位置.为了有效地解决该问题,设计了两个基于位并行的匹配算法MOTW(Method of ocurrence then window)算法和MWTO(Method of window then ocurrence)算法.同时,MWTO算法进行细微改动就可以满足全局长度约束.实验结果既验证了算法求解问题的正确性,又验证了比相关的模式匹配算法具有更好的时间性能.展开更多
Pattern matching with wildcards(PMW)has great theoretical and practical significance in bioinformatics,information retrieval,and pattern mining.Due to the uncertainty of wildcards,not only is the number of all matches...Pattern matching with wildcards(PMW)has great theoretical and practical significance in bioinformatics,information retrieval,and pattern mining.Due to the uncertainty of wildcards,not only is the number of all matches exponential with respect to the maximal gap flexibility and the pattern length,but the matching positions in PMW are also hard to choose.The objective to count the maximal number of matches one by one is computationally infeasible.Therefore,rather than solving the generic PMW problem,many research efforts have further defined new problems within PMW according to different application backgrounds.To break through the limitations of either fixing the number or allowing an unbounded number of wildcards,pattern matching with flexible wildcards(PMFW)allows the users to control the ranges of wildcards.In this paper,we provide a survey on the state-of-the-art algorithms for PMFW,with detailed analyses and comparisons,and discuss challenges and opportunities in PMFW research and applications.展开更多
文摘基因序列中,许多病毒并不是简单的直接复制自己,而是相邻字符间插入或者删除序列片段,如何从序列数据中检索这些病毒具有重要的研究价值.提出了一个更普遍的问题,带任意长度通配符的模式匹配问题(Pattern matching with arbitrary-length wildcards,PMAW),这里模式中不仅可以有多个通配符约束,而且每个通配符的约束可以是两个整数,也可以从整数到无穷大.给定序列S和带通配符的模式P,目标是从S中检索P的所有出现和每一次出现的匹配位置,并且要求任意两次出现不能共享序列中同一位置.为了有效地解决该问题,设计了两个基于位并行的匹配算法MOTW(Method of ocurrence then window)算法和MWTO(Method of window then ocurrence)算法.同时,MWTO算法进行细微改动就可以满足全局长度约束.实验结果既验证了算法求解问题的正确性,又验证了比相关的模式匹配算法具有更好的时间性能.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.61229301 and 60828005the Program for Changjiang Scholars and Innovative Research Team in University(PCSIRT)of the Ministry of Education,China,under Grant No.IRT13059the National Science Foundation(NSF)of USA under Grant No.0514819
文摘Pattern matching with wildcards(PMW)has great theoretical and practical significance in bioinformatics,information retrieval,and pattern mining.Due to the uncertainty of wildcards,not only is the number of all matches exponential with respect to the maximal gap flexibility and the pattern length,but the matching positions in PMW are also hard to choose.The objective to count the maximal number of matches one by one is computationally infeasible.Therefore,rather than solving the generic PMW problem,many research efforts have further defined new problems within PMW according to different application backgrounds.To break through the limitations of either fixing the number or allowing an unbounded number of wildcards,pattern matching with flexible wildcards(PMFW)allows the users to control the ranges of wildcards.In this paper,we provide a survey on the state-of-the-art algorithms for PMFW,with detailed analyses and comparisons,and discuss challenges and opportunities in PMFW research and applications.