Pattern matching method is one of the classic classifications of existing online portfolio selection strategies. This article aims to study the key aspects of this method—measurement of similarity and selection of si...Pattern matching method is one of the classic classifications of existing online portfolio selection strategies. This article aims to study the key aspects of this method—measurement of similarity and selection of similarity sets, and proposes a Portfolio Selection Method based on Pattern Matching with Dual Information of Direction and Distance (PMDI). By studying different combination methods of indicators such as Euclidean distance, Chebyshev distance, and correlation coefficient, important information such as direction and distance in stock historical price information is extracted, thereby filtering out the similarity set required for pattern matching based investment portfolio selection algorithms. A large number of experiments conducted on two datasets of real stock markets have shown that PMDI outperforms other algorithms in balancing income and risk. Therefore, it is suitable for the financial environment in the real world.展开更多
Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given qu...Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given query graph in a data graph.The exact GPM has been widely used in biological data analyses,social network analyses and other fields.In this paper,the applications of the exact GPM were first introduced,and the research progress of the exact GPM was summarized.Then,the related algorithms were introduced in detail,and the experiments on the state-of-the-art exact GPM algorithms were conducted to compare their performance.Based on the experimental results,the applicable scenarios of the algorithms were pointed out.New research opportunities in this area were proposed.展开更多
In this study,a machine vision-based pattern matching technique was applied to estimate the location of an autonomous driving robot and perform 3D tunnel mapping in an underground mine environment.The autonomous drivi...In this study,a machine vision-based pattern matching technique was applied to estimate the location of an autonomous driving robot and perform 3D tunnel mapping in an underground mine environment.The autonomous driving robot continuously detects the wall of the tunnel in the horizontal direction using the light detection and ranging(Li DAR)sensor and performs pattern matching by recognizing the shape of the tunnel wall.The proposed method was designed to measure the heading of the robot by fusion with the inertial measurement units sensor according to the pattern matching accuracy;it is combined with the encoder sensor to estimate the location of the robot.In addition,when the robot is driving,the vertical direction of the underground mine is scanned through the vertical Li DAR sensor and stacked to create a 3D map of the underground mine.The performance of the proposed method was superior to that of previous studies;the mean absolute error achieved was 0.08 m for the X-Y axes.A root mean square error of 0.05 m^(2)was achieved by comparing the tunnel section maps that were created by the autonomous driving robot to those of manual surveying.展开更多
The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It i...The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It is proved by experiment that the algorithm has three features, its construction process is quick, its cost of memory is small. At the same time, its searching process is as quick as the traditional algorithm. The algorithm is suitable for the application which requires preprocessing the patterns dynamically.展开更多
Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many...Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many other applications highly depend on pattern matching or similarity searches. In this paper, we discuss some of the string matching solutions developed in the past. Then, we present a novel mathematical model to search for a given pattern and it’s near approximates in the text.展开更多
Most of the Point Pattern Matching (PPM) algorithm performs poorly when the noise of the point's position and outliers exist. This paper presents a novel and robust PPM algorithm which combined Point Pair Topologi...Most of the Point Pattern Matching (PPM) algorithm performs poorly when the noise of the point's position and outliers exist. This paper presents a novel and robust PPM algorithm which combined Point Pair Topological Characteristics (PPTC) and Spectral Matching (SM) together to solve the afore mentioned issues. In which PPTC, a new shape descriptor, is firstly proposed. A new comparability measurement based on PPTC is defined as the matching probability. Finally, the correct matching results are achieved by the spectral matching method. The synthetic data experiments show its robustness by comparing with the other state-of-art algorithms and the real world data experiments show its effectiveness.展开更多
The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource...The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource allocation and provide customized services to users. The first step of analyzing user behaviors is to extract information of user actions from HTTP traffic data by multi-pattern URL matching. However, the efficiency is a huge problem when performing this work on massive network traffic data. To solve this problem, we propose a novel and accurate algorithm named Multi-Pattern Parallel Matching(MPPM) that takes advantage of HashMap in data searching for extracting user behaviors from big network data more effectively. Extensive experiments based on real-world traffic data prove the ability of MPPM algorithm to deal with massive HTTP traffic with better performance on accuracy, concurrency and efficiency. We expect the proposed algorithm and it parallelized implementation would be a solid base to build a high-performance analysis engine of user behavior based on massive HTTP traffic data processing.展开更多
Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck ...Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.展开更多
This paper presents an efficient pattern matching algorithm (FSW). FSW improves the searching process for a pattern in a text. It scans the text with the help of four sliding windows. The windows are equal to the leng...This paper presents an efficient pattern matching algorithm (FSW). FSW improves the searching process for a pattern in a text. It scans the text with the help of four sliding windows. The windows are equal to the length of the pattern, allowing multiple alignments in the searching process. The text is divided into two parts;each part is scanned from both sides simultaneously using two sliding windows. The four windows slide in parallel in both parts of the text. The comparisons done between the text and the pattern are done from both of the pattern sides in parallel. The conducted experiments show that FSW achieves the best overall results in the number of attempts and the number of character comparisons compared to the pattern matching algorithms: Two Sliding Windows (TSW), Enhanced Two Sliding Windows algorithm (ETSW) and Berry-Ravindran algorithm (BR). The best time case is calculated and found to be??while the average case time complexity is??.展开更多
Pattern matching is a very important topic in computer science. It has been used in various applications such as information retrieval, virus scanning, DNA sequence analysis, data mining, machine learning, network sec...Pattern matching is a very important topic in computer science. It has been used in various applications such as information retrieval, virus scanning, DNA sequence analysis, data mining, machine learning, network security and pattern recognition. This paper has presented a new pattern matching algorithm—Enhanced ERS-A, which is an improvement over ERS-S algorithm. In ERS-A, two sliding windows are used to scan the text from the left and the right simultaneously. The proposed algorithm also scans the text from the left and the right simultaneously as well as making comparisons with the pattern from both sides simultaneously. The comparisons done between the text and the pattern are done from both sides in parallel. The shift technique used in the Enhanced ERS-A is the four consecutive characters in the text immediately following the pattern window. The experimental results show that the Enhanced ERS-A has enhanced the process of pattern matching by reducing the number of comparisons performed.展开更多
Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplicat...Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplication of strings. This paper presents a fast string cross pattern matching algorithm based on extracting high frequency strings. Compared with existing algorithms including single-pattern algorithms and multi-pattern matching algorithms, this algorithm is featured by both low time complexity and low space complexity. Because Chinese alphabet is large and the average length of Chinese words is much short, this algorithm is more suitable to process the text written by Chinese, especially when the size of Σ is large and the number of strings is far more than the maximum length of strings of set U.展开更多
This paper discusses potential application of fuzzy set theory,more specifically, pattern matching, in assessing risk in chemicalplants. Risk factors have been evaluated using linguisticrepresentations of the quantity...This paper discusses potential application of fuzzy set theory,more specifically, pattern matching, in assessing risk in chemicalplants. Risk factors have been evaluated using linguisticrepresentations of the quantity of the hazardous substance involved,its frequency of interaction with the environment, severity of itsimpact and the uncertainty involved in its detection in advance. Foreach linguistic value there is a corresponding membership functionranging over a universe of discourse. The risk scenario created by ahazard/hazardous situation having highest degree of featural value istaken as the known pattern.展开更多
In order to identify any traces of suspicious activities for the networks security, Network Traffic Analysis has been the basis of network security and network management. With the continued emergence of new applicati...In order to identify any traces of suspicious activities for the networks security, Network Traffic Analysis has been the basis of network security and network management. With the continued emergence of new applications and encrypted traffic, the currently available approaches can not perform well for all kinds of network data. In this paper, we propose a novel stream pattern matching technique which is not only easily deployed but also includes the advantages of different methods. The main idea is: first, defining a formal description specification, by which any series of data stream can be unambiguously descrbed by a special stream pattern; then a tree representation is constructed by parsing the stream pattern; at last, a stream pattern engine is constructed with the Non-t-mite automata (S-CG-NFA) and Bit-parallel searching algorithms. Our stream pattern analysis system has been fully prototyped on C programming language and Xilinx Vn-tex2 FPGA. The experimental results show the method could provides a high level of recognition efficiency and accuracy.展开更多
Based on the study of single pattern matching, MBF algorithm is proposed by imitating the string searching procedure of human. The algorithm preprocesses the pattern by using the idea of Quick Search algorithm and the...Based on the study of single pattern matching, MBF algorithm is proposed by imitating the string searching procedure of human. The algorithm preprocesses the pattern by using the idea of Quick Search algorithm and the already-matched pattern psefix and suffix information. In searching phase, the algorithm makes use of the!character using frequency and the continue-skip idea. The experiment shows that MBF algorithm is more efficient than other algorithms.展开更多
Pattern matching is a very important algorithm used in many applications such as search engine and DNA analysis. They are aiming to find a pattern in a text. This paper proposes a Pattern Matching Algorithm Using Chan...Pattern matching is a very important algorithm used in many applications such as search engine and DNA analysis. They are aiming to find a pattern in a text. This paper proposes a Pattern Matching Algorithm Using Changing Consecutive Characters (PMCCC) to make the searching pro- cess of the algorithm faster. PMCCC enhances the shift process that determines how the pattern moves in case of the occurrence of the mismatch between the pattern and the text. It enhances the Berry Ravindran (BR) shift function by using m consecutive characters where m is the pattern length. The formal basis and the algorithms are presented. The experimental results show that PMCCC made enhancements in searching process by reducing the number of comparisons and the number of attempts. Comparing the results of PMCCC with other related algorithms has shown significant enhancements in average number of comparisons and average number of attempts.展开更多
Due to its importance in security, syntax analysis has found usage in many high-level programming languages. The Lisp language has its share of operations for evaluating regular expressions, but native parsing of Lisp...Due to its importance in security, syntax analysis has found usage in many high-level programming languages. The Lisp language has its share of operations for evaluating regular expressions, but native parsing of Lisp code in this way is unsupported. Matching on lists requires a significantly more complicated model, with a different programmatic approach than that of string matching. This work presents a new automata-based approach centered on a set of functions and macros for identifying sequences of Lisp S-expressions using finite tree automata. The objective is to test that a given list is an element of a given tree language. We use a macro that takes a grammar and generates a function that reads off the leaves of a tree and tries to parse them as a string in a context-free language. The experimental results indicate that this approach is a viable tool for parsing Lisp lists and expressions in the abstract interpretation展开更多
In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly comple...In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD—Document Type Definition, Schema, etc.) increases the risk of obtaining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as “*”). The reason is that in both cases, users write queries according to the document model they have in mind;this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evaluations. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results;moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the proposed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.展开更多
Images (typically JPEG) are used as evidence against cyber perpetrators. Typically the file is carved using standard patterns. Many concentrate on carving JPEG files and overlook the important of thumbnail in assistin...Images (typically JPEG) are used as evidence against cyber perpetrators. Typically the file is carved using standard patterns. Many concentrate on carving JPEG files and overlook the important of thumbnail in assisting forensic investigation. However, a new unique pattern is used to detect thumbnail/s and embedded JPEG file. This paper is to introduce a tool call PattrecCarv to recognize thumbnail/s or embedded JPEG files using unique hex patterns (UHP). A tool called PattrecCarv is developed to automatically carve thumbnail/s and embedded JPEG files using DFRWS 2006 and DFRWS 2007 datasets. The tool successfully recovers 11.5% more thumbnails and embedded JPEG files than PredClus.展开更多
文摘Pattern matching method is one of the classic classifications of existing online portfolio selection strategies. This article aims to study the key aspects of this method—measurement of similarity and selection of similarity sets, and proposes a Portfolio Selection Method based on Pattern Matching with Dual Information of Direction and Distance (PMDI). By studying different combination methods of indicators such as Euclidean distance, Chebyshev distance, and correlation coefficient, important information such as direction and distance in stock historical price information is extracted, thereby filtering out the similarity set required for pattern matching based investment portfolio selection algorithms. A large number of experiments conducted on two datasets of real stock markets have shown that PMDI outperforms other algorithms in balancing income and risk. Therefore, it is suitable for the financial environment in the real world.
文摘Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given query graph in a data graph.The exact GPM has been widely used in biological data analyses,social network analyses and other fields.In this paper,the applications of the exact GPM were first introduced,and the research progress of the exact GPM was summarized.Then,the related algorithms were introduced in detail,and the experiments on the state-of-the-art exact GPM algorithms were conducted to compare their performance.Based on the experimental results,the applicable scenarios of the algorithms were pointed out.New research opportunities in this area were proposed.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1A2C1011216)。
文摘In this study,a machine vision-based pattern matching technique was applied to estimate the location of an autonomous driving robot and perform 3D tunnel mapping in an underground mine environment.The autonomous driving robot continuously detects the wall of the tunnel in the horizontal direction using the light detection and ranging(Li DAR)sensor and performs pattern matching by recognizing the shape of the tunnel wall.The proposed method was designed to measure the heading of the robot by fusion with the inertial measurement units sensor according to the pattern matching accuracy;it is combined with the encoder sensor to estimate the location of the robot.In addition,when the robot is driving,the vertical direction of the underground mine is scanned through the vertical Li DAR sensor and stacked to create a 3D map of the underground mine.The performance of the proposed method was superior to that of previous studies;the mean absolute error achieved was 0.08 m for the X-Y axes.A root mean square error of 0.05 m^(2)was achieved by comparing the tunnel section maps that were created by the autonomous driving robot to those of manual surveying.
基金This project was supported by the National "863" High Technology Research and Development Program of China(2003AA142160) and the National Natural Science Foundation of China (60402019)
文摘The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It is proved by experiment that the algorithm has three features, its construction process is quick, its cost of memory is small. At the same time, its searching process is as quick as the traditional algorithm. The algorithm is suitable for the application which requires preprocessing the patterns dynamically.
文摘Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many other applications highly depend on pattern matching or similarity searches. In this paper, we discuss some of the string matching solutions developed in the past. Then, we present a novel mathematical model to search for a given pattern and it’s near approximates in the text.
文摘Most of the Point Pattern Matching (PPM) algorithm performs poorly when the noise of the point's position and outliers exist. This paper presents a novel and robust PPM algorithm which combined Point Pair Topological Characteristics (PPTC) and Spectral Matching (SM) together to solve the afore mentioned issues. In which PPTC, a new shape descriptor, is firstly proposed. A new comparability measurement based on PPTC is defined as the matching probability. Finally, the correct matching results are achieved by the spectral matching method. The synthetic data experiments show its robustness by comparing with the other state-of-art algorithms and the real world data experiments show its effectiveness.
基金supported in part by National Natural Science Foundation of China(61671078)the Director Funds of Beijing Key Laboratory of Network System Architecture and Convergence(2017BKL-NSACZJ-06)
文摘The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource allocation and provide customized services to users. The first step of analyzing user behaviors is to extract information of user actions from HTTP traffic data by multi-pattern URL matching. However, the efficiency is a huge problem when performing this work on massive network traffic data. To solve this problem, we propose a novel and accurate algorithm named Multi-Pattern Parallel Matching(MPPM) that takes advantage of HashMap in data searching for extracting user behaviors from big network data more effectively. Extensive experiments based on real-world traffic data prove the ability of MPPM algorithm to deal with massive HTTP traffic with better performance on accuracy, concurrency and efficiency. We expect the proposed algorithm and it parallelized implementation would be a solid base to build a high-performance analysis engine of user behavior based on massive HTTP traffic data processing.
基金supported by China MOST project (No.2012BAH46B04)
文摘Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.
文摘This paper presents an efficient pattern matching algorithm (FSW). FSW improves the searching process for a pattern in a text. It scans the text with the help of four sliding windows. The windows are equal to the length of the pattern, allowing multiple alignments in the searching process. The text is divided into two parts;each part is scanned from both sides simultaneously using two sliding windows. The four windows slide in parallel in both parts of the text. The comparisons done between the text and the pattern are done from both of the pattern sides in parallel. The conducted experiments show that FSW achieves the best overall results in the number of attempts and the number of character comparisons compared to the pattern matching algorithms: Two Sliding Windows (TSW), Enhanced Two Sliding Windows algorithm (ETSW) and Berry-Ravindran algorithm (BR). The best time case is calculated and found to be??while the average case time complexity is??.
文摘Pattern matching is a very important topic in computer science. It has been used in various applications such as information retrieval, virus scanning, DNA sequence analysis, data mining, machine learning, network security and pattern recognition. This paper has presented a new pattern matching algorithm—Enhanced ERS-A, which is an improvement over ERS-S algorithm. In ERS-A, two sliding windows are used to scan the text from the left and the right simultaneously. The proposed algorithm also scans the text from the left and the right simultaneously as well as making comparisons with the pattern from both sides simultaneously. The comparisons done between the text and the pattern are done from both sides in parallel. The shift technique used in the Enhanced ERS-A is the four consecutive characters in the text immediately following the pattern window. The experimental results show that the Enhanced ERS-A has enhanced the process of pattern matching by reducing the number of comparisons performed.
文摘Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplication of strings. This paper presents a fast string cross pattern matching algorithm based on extracting high frequency strings. Compared with existing algorithms including single-pattern algorithms and multi-pattern matching algorithms, this algorithm is featured by both low time complexity and low space complexity. Because Chinese alphabet is large and the average length of Chinese words is much short, this algorithm is more suitable to process the text written by Chinese, especially when the size of Σ is large and the number of strings is far more than the maximum length of strings of set U.
文摘This paper discusses potential application of fuzzy set theory,more specifically, pattern matching, in assessing risk in chemicalplants. Risk factors have been evaluated using linguisticrepresentations of the quantity of the hazardous substance involved,its frequency of interaction with the environment, severity of itsimpact and the uncertainty involved in its detection in advance. Foreach linguistic value there is a corresponding membership functionranging over a universe of discourse. The risk scenario created by ahazard/hazardous situation having highest degree of featural value istaken as the known pattern.
基金This work is supported by the following projects: National Natural Science Foundation of China grant 60772136, 111 Development Program of China NO.B08038, National Science & Technology Pillar Program of China NO.2008BAH22B03 and NO. 2007BAH08B01.
文摘In order to identify any traces of suspicious activities for the networks security, Network Traffic Analysis has been the basis of network security and network management. With the continued emergence of new applications and encrypted traffic, the currently available approaches can not perform well for all kinds of network data. In this paper, we propose a novel stream pattern matching technique which is not only easily deployed but also includes the advantages of different methods. The main idea is: first, defining a formal description specification, by which any series of data stream can be unambiguously descrbed by a special stream pattern; then a tree representation is constructed by parsing the stream pattern; at last, a stream pattern engine is constructed with the Non-t-mite automata (S-CG-NFA) and Bit-parallel searching algorithms. Our stream pattern analysis system has been fully prototyped on C programming language and Xilinx Vn-tex2 FPGA. The experimental results show the method could provides a high level of recognition efficiency and accuracy.
文摘Based on the study of single pattern matching, MBF algorithm is proposed by imitating the string searching procedure of human. The algorithm preprocesses the pattern by using the idea of Quick Search algorithm and the already-matched pattern psefix and suffix information. In searching phase, the algorithm makes use of the!character using frequency and the continue-skip idea. The experiment shows that MBF algorithm is more efficient than other algorithms.
文摘Pattern matching is a very important algorithm used in many applications such as search engine and DNA analysis. They are aiming to find a pattern in a text. This paper proposes a Pattern Matching Algorithm Using Changing Consecutive Characters (PMCCC) to make the searching pro- cess of the algorithm faster. PMCCC enhances the shift process that determines how the pattern moves in case of the occurrence of the mismatch between the pattern and the text. It enhances the Berry Ravindran (BR) shift function by using m consecutive characters where m is the pattern length. The formal basis and the algorithms are presented. The experimental results show that PMCCC made enhancements in searching process by reducing the number of comparisons and the number of attempts. Comparing the results of PMCCC with other related algorithms has shown significant enhancements in average number of comparisons and average number of attempts.
文摘Due to its importance in security, syntax analysis has found usage in many high-level programming languages. The Lisp language has its share of operations for evaluating regular expressions, but native parsing of Lisp code in this way is unsupported. Matching on lists requires a significantly more complicated model, with a different programmatic approach than that of string matching. This work presents a new automata-based approach centered on a set of functions and macros for identifying sequences of Lisp S-expressions using finite tree automata. The objective is to test that a given list is an element of a given tree language. We use a macro that takes a grammar and generates a function that reads off the leaves of a tree and tries to parse them as a string in a context-free language. The experimental results indicate that this approach is a viable tool for parsing Lisp lists and expressions in the abstract interpretation
文摘In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD—Document Type Definition, Schema, etc.) increases the risk of obtaining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as “*”). The reason is that in both cases, users write queries according to the document model they have in mind;this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evaluations. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results;moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the proposed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.
文摘Images (typically JPEG) are used as evidence against cyber perpetrators. Typically the file is carved using standard patterns. Many concentrate on carving JPEG files and overlook the important of thumbnail in assisting forensic investigation. However, a new unique pattern is used to detect thumbnail/s and embedded JPEG file. This paper is to introduce a tool call PattrecCarv to recognize thumbnail/s or embedded JPEG files using unique hex patterns (UHP). A tool called PattrecCarv is developed to automatically carve thumbnail/s and embedded JPEG files using DFRWS 2006 and DFRWS 2007 datasets. The tool successfully recovers 11.5% more thumbnails and embedded JPEG files than PredClus.