期刊文献+

一般间隙与One-Off条件的序列模式匹配 被引量:3

Sequential Pattern Matching with General Gap and One-Off Condition
下载PDF
导出
摘要 带有间隙约束的模式匹配问题是序列模式挖掘的关键问题之一.目前,大多数的研究都为非负间隙,对字符串中每个字符的出现顺序有着严格的要求.为了增加匹配的灵活性,并且考虑到在序列模式挖掘中采用one-off条件更加合理,研究一般间隙与one-off条件下的模式匹配问题.该问题为NP-Hard问题.为了有效地求解该问题,提出了MSAING(maximum sequential pattern matching with one-off and general gaps condition)算法:首先,利用Reverse策略使模式与序列达到最佳的匹配状态;然后,使用线性表的结构使匹配过程中消耗的时间和空间大幅度地降低,同时,利用回溯机制提高匹配的成功率;最后,根据inside_Checking机制判断模式串是否会产生内部重复现象,以进一步提高算法的执行效率.理论证明了MSAING算法的完备性,实验结果验证了MSAING算法匹配结果的准确性以及在时间和空间方面的高效性. Pattern matching with gap constraints is one of the key issues of sequential pattern mining. Recently, most research work focuses on pattern matching with non-negative gaps, but the rule strictly limits the order that each character appears in the sequence. In order to increase the flexibility of matching while taking into account that it is more reasonable to use one-off condition in sequential pattern mining, this paper studies the pattern matching problem under general gap and one-off condition, which is NP-hard. To tackle this issue, an algorithm, named MSAING, is proposed. Firstly, the algorithm processes the pattern and sequence using the Reverse strategy to get the maximum number of matching results. Secondly, it significantly reduces the time and space overhead with linear table structure in the matching process, and improves the matching rate using the backtracking method. Finally, to further improve the efficiency of the algorithm, it determines whether internal repetition exists in the pattern or not, according to the inside_Checking mechanism. Completeness of the MSAING algorithm is proved in theory. Experimental results verify the accuracy of the matching results of the MSAING algorithm and its validity in terms of the time and space complexity.
出处 《软件学报》 EI CSCD 北大核心 2018年第2期363-382,共20页 Journal of Software
基金 国家重点研发计划(2016YFB1000901) 国家自然科学基金(61202227)
关键词 一般间隙 one-off条件 模式匹配 线性表 general gap one-off condition pattern matching linear table
  • 相关文献

参考文献3

二级参考文献19

  • 1Lunteren J V. High-performance pattern-matching for intrusion detection//Proceedings of the 25th IEEE International Conference on Computer Communications ( INFOCOM 2006). Barcelona, Spain, 2006:1-13.
  • 2Califf M E, Mooney R J. Bottom up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 2003, 4(6): 177-210.
  • 3Cole R, Gottlieb L A, Lewenstein M. Dictionary matching and indexing with errors and don't cares//Proceedings of the 36th ACM Symposium on the Theory of Computing. New York, USA, 2004:91-100.
  • 4Cole J R, Chai B, Farris R J, Wang Q, Kulam S A, McGartell D M, Garrity G M, Tiedje J M. The ribosomal database project (RDP-II) : Sequences and tools for high-throughput rRNA analysis. Nucleic Acids Research, 2005, 33(Sup. 1): 294-296.
  • 5Zhang M, Kao B, Cheung D, Yip K. Mining periodic patterns with gap requirement from sequences//Proceedings of the ACM SIGMOD International Conference on Management of Data. Maryland, USA, 2005:623-633.
  • 6Han J, Cheng H, Xin D, Yan X. Frequent pattern mining.. Current status and future directions. Data Mining and Knowledge Discovery, 2007, 15(1) : 55-86.
  • 7Ji X, Bailey J, Dong G. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems, 2007, 11(3): 259- 286.
  • 8He Y, Wu X, Zhu X, Arslan A N. Mining frequent patterns with wildcards from biological sequences//Proceedings of the 2007 IEEE International Conference on Information Reuse and Integration(IRI-07). Las Vegas, USA, 2007: 329-334.
  • 9Fischer M J, Paterson M S. String matching and other products//Proeeedings of the 7th SIAM AMS Complexity of Computation. Cambridge, USA, 1974:113-125.
  • 10Manber U, Baeza-Yates R. An algorithm for string matching with a sequence of don't cares. Information Processing Letters, 1991, 37(3): 133 -136.

共引文献27

同被引文献21

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部