期刊文献+

一种求解MPMGOOC问题的启发式算法 被引量:21

A Heuristic Algorithm for MPMGOOC
下载PDF
导出
摘要 具有间隙约束和一次性条件的最大模式匹配(Maximum Pattern Matching with Gaps and One-Off Condition,MPMGOOC)是一种具有通配符长度约束的模式匹配问题,其任务是寻找彼此互不相关的最多出现.文中基于一种新的非线性数据结构——网树,提出了一种解决MPMGOOC问题的启发式算法.与树结构不同之处在于,除根结点外,网树中任何结点可以多于1个双亲结点.文中给出了网树的定义及其相关的概念和性质.基于这些概念和性质,提出了一种选择较优出现(Selecting Better Occurrence,SBO)的启发式算法.该算法在搜索一个出现的循环中,采用了贪婪搜索双亲策略(Strategy of Greedy-Search Parent,SGSP)和最右双亲策略(Strategy of RightMostParent,SRMP)寻找相同叶子的两个出现并选择其中较好的出现作为SBO算法的结果.SGSP策略的核心思想是每一步都寻找当前结点的一个近似最优双亲(Approximately Optimimal Parent,AOP);SRMP策略的核心思想是每一步都寻找当前结点的最右双亲结点.实验结果表明,在多数情况下SBO算法可以获得更好的解且解的质量较其它算法有显著的提高.文中不但提供了一个解决MPMGOOC问题的启发式算法,更重要的是对于求解其它复杂问题具有一定的参考价值. Maximum Pattern Matching with Gaps and the One-Off Condition (MPMGOOC) is an interesting and challenging pattern matching problem, which seeks to find the maximal number of occurrences of a pattern in a sequence. In this paper, a heuristic algorithm based on a new nonlin- ear data structure, Nettree, is proposed for this problem. A Nettree is different from a regular tree in that a node may have more than one parent. The algorithm is named Selecting Better Occurrence (SBO). SBO uses some special concepts and properties of the Nettree to solve the task. In the loop of finding an occurrence, SBO uses two strategies, Strategy of Greedy-Search Parent (SGSP) and Strategy of RightMost Parent (SRMP) to find two occurrences with the same leaf, and then selects a better occurrence from the results of SGSP and SRMP. The main ideas of SGSP and SRMP are to find an Approximately Optimal Parent (AOP) and the rightmost parent of the current node at each step in the process of searching for an occurrence, respectively. Extensive experimental results on real-world biological data demonstrate that SBO achieves the best performance among all competitive algorithms in terms of solution quality. This paper not only provides a heuristic solution for the MPMGOOC problem, but also shows that the Nettree can be used to solve other complex problems.
出处 《计算机学报》 EI CSCD 北大核心 2011年第8期1452-1462,共11页 Chinese Journal of Computers
基金 国家自然科学基金(60828005)资助~~
关键词 模式匹配 通配符 一次性条件 网树 启发式算法 pattern matching wildcards oneoff condition Nettree heuristic algorithm
  • 相关文献

参考文献14

  • 1Lunteren J V. High-performance pattern-matching for intrusion detection//Proceedings of the 25th IEEE International Conference on Computer Communications ( INFOCOM 2006). Barcelona, Spain, 2006:1-13.
  • 2Califf M E, Mooney R J. Bottom up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 2003, 4(6): 177-210.
  • 3Cole R, Gottlieb L A, Lewenstein M. Dictionary matching and indexing with errors and don't cares//Proceedings of the 36th ACM Symposium on the Theory of Computing. New York, USA, 2004:91-100.
  • 4Cole J R, Chai B, Farris R J, Wang Q, Kulam S A, McGartell D M, Garrity G M, Tiedje J M. The ribosomal database project (RDP-II) : Sequences and tools for high-throughput rRNA analysis. Nucleic Acids Research, 2005, 33(Sup. 1): 294-296.
  • 5Zhang M, Kao B, Cheung D, Yip K. Mining periodic patterns with gap requirement from sequences//Proceedings of the ACM SIGMOD International Conference on Management of Data. Maryland, USA, 2005:623-633.
  • 6Han J, Cheng H, Xin D, Yan X. Frequent pattern mining.. Current status and future directions. Data Mining and Knowledge Discovery, 2007, 15(1) : 55-86.
  • 7Ji X, Bailey J, Dong G. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems, 2007, 11(3): 259- 286.
  • 8He Y, Wu X, Zhu X, Arslan A N. Mining frequent patterns with wildcards from biological sequences//Proceedings of the 2007 IEEE International Conference on Information Reuse and Integration(IRI-07). Las Vegas, USA, 2007: 329-334.
  • 9Fischer M J, Paterson M S. String matching and other products//Proeeedings of the 7th SIAM AMS Complexity of Computation. Cambridge, USA, 1974:113-125.
  • 10Manber U, Baeza-Yates R. An algorithm for string matching with a sequence of don't cares. Information Processing Letters, 1991, 37(3): 133 -136.

同被引文献165

  • 1刘夫云,祁国宁,车宏安.复杂网络中简单路径搜索算法及其应用研究[J].系统工程理论与实践,2006,26(4):9-13. 被引量:24
  • 2潘云鹤,王金龙,徐从富.数据流频繁模式挖掘研究进展[J].自动化学报,2006,32(4):594-602. 被引量:34
  • 3Kim S K. Optimal algorithms for finding the longest path with length and sum constraints in a tree. IEICE Transac- tions on Information and Systems, 2011, E94-D(6): 1325- 1328.
  • 4Williams R. Finding paths of length k in O * (2k) time. Information Processing Letters, 2009, 109(6): 315-318.
  • 5Uehara R, Valiente G. Linear structure of bipartite permuta tion graphs and the longest path problem. Information Pro- cessing Letters, 2007, 103(2): 71-77.
  • 6Ioannidou K, Mertzios G, Nikolopoulos S. The longest path problem is polynomial on interval graphs. Mathematical Foundations of Computer Science, 2009, (5734): 403-414.
  • 7Edmonds J, Chakraborty S. Bounding variance and expecta tion of longest path lengths in DAGs//Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algo- rithms. Philadelphia, USA, 2010:766-781.
  • 8Chen J, Lu S, Sze S, Zhang F. Improved algorithms for path, matching, and packing problems//Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algo- rithms. Philadelphia, USA, 2007: 298-307.
  • 9Scott J, Ideker T, Karp R M, Sharan R. Efficient algo- rithms for detecting signaling pathways in protein interaction networks. Journal of Computational Biology, 2006, 13 (2) 133-144.
  • 10Desaulniers G, Lessard F, Hadjar A. Tabu search, partial elementarity, and generalized k-path inequalities for the vehi cle routing problem with time windows. Transportation Science, 2008, 42(3): 387-404.

引证文献21

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部