The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It i...The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It is proved by experiment that the algorithm has three features, its construction process is quick, its cost of memory is small. At the same time, its searching process is as quick as the traditional algorithm. The algorithm is suitable for the application which requires preprocessing the patterns dynamically.展开更多
Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck ...Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.展开更多
Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design effic...Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design efficient multiple pattern searching algorithms on multi-core architectures.We observe an important feature which indicates that the multiple pattern matching time mainly depends on the number and minimal length of patterns.The multi-core algorithm proposed in this paper leverages this feature to decompose pattern set so that the parallel execution time is minimized.We formulate the problem as an optimal decomposition and scheduling of a pattern set,then propose a heuristic algorithm,which takes advantage of dynamic programming and greedy algorithmic techniques,to solve the optimization problem.Experimental results suggest that our decomposition approach can increase the searching speed by more than 200% on a 4-core AMD Barcelona system.展开更多
Objectives:Genomic signatures like k-mers have become one of the most prominent approaches to describe genomic data.As a result,myriad real-world applications,such as the construction of de Bruijn graphs in genome ass...Objectives:Genomic signatures like k-mers have become one of the most prominent approaches to describe genomic data.As a result,myriad real-world applications,such as the construction of de Bruijn graphs in genome assembly,have been benefited by recognizing genomic signatures.In other words,an efficient approachof genomic signatureprofiling is an essential need for tackling high-throughput sequencing reads.However,most of the existing approaches only recognize fixed-size k-merswhile many research studies have shown the importance of considering variable-length k-mers.Methods:In this paper,we present a novel genomic signature profiling approach,TahcoRoll,by extending the Aho–Corasick algorithm(AC)for the task of profiling variable-length k-mers.We first group nucleotides into two clusters and represent each cluster with a bit.The rolling hash technique is further utilized to encode signatures and read patterns for efficient matching.Results:In extensive experiments,TahcoRoll significantly outperforms the most state-of-the-art k-mer counters and has the capability of processing reads across different sequencing platforms on a budget desktop computer.Conclusions:The single-thread version of TahcoRoll is as efficient as the eight-thread version of the state-of-the-art,JellyFish,while the eight-thread TahcoRoll outperforms the eight-thread JellyFish by at least four times.展开更多
基金This project was supported by the National "863" High Technology Research and Development Program of China(2003AA142160) and the National Natural Science Foundation of China (60402019)
文摘The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It is proved by experiment that the algorithm has three features, its construction process is quick, its cost of memory is small. At the same time, its searching process is as quick as the traditional algorithm. The algorithm is suitable for the application which requires preprocessing the patterns dynamically.
基金supported by China MOST project (No.2012BAH46B04)
文摘Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.
基金supported by the National Natural Science Foundation of China under Grant Nos.60803030,60925009,60921002the National Basic Research 973 Program of China under Grant No.2011CB302502
文摘Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design efficient multiple pattern searching algorithms on multi-core architectures.We observe an important feature which indicates that the multiple pattern matching time mainly depends on the number and minimal length of patterns.The multi-core algorithm proposed in this paper leverages this feature to decompose pattern set so that the parallel execution time is minimized.We formulate the problem as an optimal decomposition and scheduling of a pattern set,then propose a heuristic algorithm,which takes advantage of dynamic programming and greedy algorithmic techniques,to solve the optimization problem.Experimental results suggest that our decomposition approach can increase the searching speed by more than 200% on a 4-core AMD Barcelona system.
文摘Objectives:Genomic signatures like k-mers have become one of the most prominent approaches to describe genomic data.As a result,myriad real-world applications,such as the construction of de Bruijn graphs in genome assembly,have been benefited by recognizing genomic signatures.In other words,an efficient approachof genomic signatureprofiling is an essential need for tackling high-throughput sequencing reads.However,most of the existing approaches only recognize fixed-size k-merswhile many research studies have shown the importance of considering variable-length k-mers.Methods:In this paper,we present a novel genomic signature profiling approach,TahcoRoll,by extending the Aho–Corasick algorithm(AC)for the task of profiling variable-length k-mers.We first group nucleotides into two clusters and represent each cluster with a bit.The rolling hash technique is further utilized to encode signatures and read patterns for efficient matching.Results:In extensive experiments,TahcoRoll significantly outperforms the most state-of-the-art k-mer counters and has the capability of processing reads across different sequencing platforms on a budget desktop computer.Conclusions:The single-thread version of TahcoRoll is as efficient as the eight-thread version of the state-of-the-art,JellyFish,while the eight-thread TahcoRoll outperforms the eight-thread JellyFish by at least four times.