Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplicat...Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplication of strings. This paper presents a fast string cross pattern matching algorithm based on extracting high frequency strings. Compared with existing algorithms including single-pattern algorithms and multi-pattern matching algorithms, this algorithm is featured by both low time complexity and low space complexity. Because Chinese alphabet is large and the average length of Chinese words is much short, this algorithm is more suitable to process the text written by Chinese, especially when the size of Σ is large and the number of strings is far more than the maximum length of strings of set U.展开更多
It is difficult for security experts to generate polymorphic signatures by using traditional string mining and matching techniques.A semantic-aware method is presented to generate a kind of two-level signature that in...It is difficult for security experts to generate polymorphic signatures by using traditional string mining and matching techniques.A semantic-aware method is presented to generate a kind of two-level signature that includes both polymorphic semantics and string patterns.It first analyzes the characteristics of polymorphic engines and categorizes the data flows into different clusters and then uses static data flow methods to extract invariable semantic instructions.And then,it combines traditional string methods to generate the signature.In comparison with other methods,experimental results show that it may effectively reduce false positives and false negatives.展开更多
文摘Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplication of strings. This paper presents a fast string cross pattern matching algorithm based on extracting high frequency strings. Compared with existing algorithms including single-pattern algorithms and multi-pattern matching algorithms, this algorithm is featured by both low time complexity and low space complexity. Because Chinese alphabet is large and the average length of Chinese words is much short, this algorithm is more suitable to process the text written by Chinese, especially when the size of Σ is large and the number of strings is far more than the maximum length of strings of set U.
基金Supported by the Natural Science Foundation of Jiangxi Province of China (2011ZBAB211002)International Science and Technology Cooperation Program of China(ISTCP) (2010DFA70990)
文摘It is difficult for security experts to generate polymorphic signatures by using traditional string mining and matching techniques.A semantic-aware method is presented to generate a kind of two-level signature that includes both polymorphic semantics and string patterns.It first analyzes the characteristics of polymorphic engines and categorizes the data flows into different clusters and then uses static data flow methods to extract invariable semantic instructions.And then,it combines traditional string methods to generate the signature.In comparison with other methods,experimental results show that it may effectively reduce false positives and false negatives.