摘要
具有间隙约束和一次性条件的最大模式匹配(Maximum Pattern Matching with Gaps and One-Off Condition,MPMGOOC)是一种具有通配符长度约束的模式匹配问题,其任务是寻找彼此互不相关的最多出现.文中基于一种新的非线性数据结构——网树,提出了一种解决MPMGOOC问题的启发式算法.与树结构不同之处在于,除根结点外,网树中任何结点可以多于1个双亲结点.文中给出了网树的定义及其相关的概念和性质.基于这些概念和性质,提出了一种选择较优出现(Selecting Better Occurrence,SBO)的启发式算法.该算法在搜索一个出现的循环中,采用了贪婪搜索双亲策略(Strategy of Greedy-Search Parent,SGSP)和最右双亲策略(Strategy of RightMostParent,SRMP)寻找相同叶子的两个出现并选择其中较好的出现作为SBO算法的结果.SGSP策略的核心思想是每一步都寻找当前结点的一个近似最优双亲(Approximately Optimimal Parent,AOP);SRMP策略的核心思想是每一步都寻找当前结点的最右双亲结点.实验结果表明,在多数情况下SBO算法可以获得更好的解且解的质量较其它算法有显著的提高.文中不但提供了一个解决MPMGOOC问题的启发式算法,更重要的是对于求解其它复杂问题具有一定的参考价值.
Maximum Pattern Matching with Gaps and the One-Off Condition (MPMGOOC) is an interesting and challenging pattern matching problem, which seeks to find the maximal number of occurrences of a pattern in a sequence. In this paper, a heuristic algorithm based on a new nonlin- ear data structure, Nettree, is proposed for this problem. A Nettree is different from a regular tree in that a node may have more than one parent. The algorithm is named Selecting Better Occurrence (SBO). SBO uses some special concepts and properties of the Nettree to solve the task. In the loop of finding an occurrence, SBO uses two strategies, Strategy of Greedy-Search Parent (SGSP) and Strategy of RightMost Parent (SRMP) to find two occurrences with the same leaf, and then selects a better occurrence from the results of SGSP and SRMP. The main ideas of SGSP and SRMP are to find an Approximately Optimal Parent (AOP) and the rightmost parent of the current node at each step in the process of searching for an occurrence, respectively. Extensive experimental results on real-world biological data demonstrate that SBO achieves the best performance among all competitive algorithms in terms of solution quality. This paper not only provides a heuristic solution for the MPMGOOC problem, but also shows that the Nettree can be used to solve other complex problems.
出处
《计算机学报》
EI
CSCD
北大核心
2011年第8期1452-1462,共11页
Chinese Journal of Computers
基金
国家自然科学基金(60828005)资助~~
关键词
模式匹配
通配符
一次性条件
网树
启发式算法
pattern matching
wildcards
oneoff condition
Nettree
heuristic algorithm