Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem ca...Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem can be solved using finite automata. We introduce a multi-pattern matching algorithm with a fixed number of wildcards to overcome the high percentage of the occurrence of wildcards in patterns. In our proposed method, patterns are matched as bit patterns using a sliding window approach. The window is a bit window that slides along the given text, matching against stored bit patterns. Matching process is executed using bit wise operations. The experimental results demonstrate that the percentage of wildcard occurrence does not affect the proposed algorithm's performance and the proposed algorithm is more efficient than the algorithms based on the fast Fourier transform. The proposed algorithm is simple to implement and runs efficiently in O(n + d(n/σ )(m/w)) time, where n is text length, d is symbol distribution over k patterns, m is pattern length, and σ is alphabet size.展开更多
基金Supported by the European Framework Program(FP7)(FP7-PEOPLE-2011-IRSES)the National Sci-Tech Support Plan of China(2014BAH02F03)
文摘Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem can be solved using finite automata. We introduce a multi-pattern matching algorithm with a fixed number of wildcards to overcome the high percentage of the occurrence of wildcards in patterns. In our proposed method, patterns are matched as bit patterns using a sliding window approach. The window is a bit window that slides along the given text, matching against stored bit patterns. Matching process is executed using bit wise operations. The experimental results demonstrate that the percentage of wildcard occurrence does not affect the proposed algorithm's performance and the proposed algorithm is more efficient than the algorithms based on the fast Fourier transform. The proposed algorithm is simple to implement and runs efficiently in O(n + d(n/σ )(m/w)) time, where n is text length, d is symbol distribution over k patterns, m is pattern length, and σ is alphabet size.