带通配符的模式匹配是一个经典的研究问题,带有可变间隙约束的模式匹配是近年来比较热门的研究方向.为适应某些查询精度要求较高的应用领域,提出一种在稀疏间隙约束条件下求解模式匹配完备解的算法SGPM-SAI(pattern matching with spars...带通配符的模式匹配是一个经典的研究问题,带有可变间隙约束的模式匹配是近年来比较热门的研究方向.为适应某些查询精度要求较高的应用领域,提出一种在稀疏间隙约束条件下求解模式匹配完备解的算法SGPM-SAI(pattern matching with sparse gaps constraint based on suffix automaton index).SGPM-SAI通过对文本串预处理,建立一种称为W-SAM的图索引结构,然后对模式串分段查找End Pos集合,最后以集合归并求交的方法得到模式匹配的完备解.实验结果表明:在不考虑预处理时间的情况下,相比几种最典型的模式匹配算法(KMP,BM,AC,suffixarray),SGPM-SAI算法性能优势显著,至少高出3~5倍.通过与SAIL算法的最新优化版本(SAIL-Gen)进行比较,在稀疏间隙约束条件下,SGPM-SAI的性能要显著优于SAIL-Gen算法.此外,为有效利用现代处理器的大规模并行处理单元,提出了并行优化后的算法Parallel SGPM-SAI.实验结果表明:Parallel SGPM-SAI算法的加速效果显著,且具有良好的并行可扩展性,能够充分利用现代众核处理器的高并行计算优势.展开更多
In recent years, there has been a growing need for complex event processing (CEP), ranging from supply chain management to security monitoring. In many scenarios events are generated in different sources but arrive ...In recent years, there has been a growing need for complex event processing (CEP), ranging from supply chain management to security monitoring. In many scenarios events are generated in different sources but arrive at the central server out of order, due to the differences of network latencies. Most state-of-the-art techniques process out-of-order events by buffering the events until the total event order within a specified range can be guaranteed. Their main problems are leading to increasing response time and reducing system throughput. This paper aims to build a high performance out-of- order event processing mechanism, which can match events as soon as they arrive instead of buffering them till all arrive. A suffix-automaton-based event matching algorithm is proposed to speed up query processing, and a confidence-based accuracy evaluation is proposed to control the query result quality. The performance of our approach is evaluated through detailed accuracy and response time analysis. As experimental results show, our approach can obviously speed up the query matching time and produce reasonable query results.展开更多
文摘带通配符的模式匹配是一个经典的研究问题,带有可变间隙约束的模式匹配是近年来比较热门的研究方向.为适应某些查询精度要求较高的应用领域,提出一种在稀疏间隙约束条件下求解模式匹配完备解的算法SGPM-SAI(pattern matching with sparse gaps constraint based on suffix automaton index).SGPM-SAI通过对文本串预处理,建立一种称为W-SAM的图索引结构,然后对模式串分段查找End Pos集合,最后以集合归并求交的方法得到模式匹配的完备解.实验结果表明:在不考虑预处理时间的情况下,相比几种最典型的模式匹配算法(KMP,BM,AC,suffixarray),SGPM-SAI算法性能优势显著,至少高出3~5倍.通过与SAIL算法的最新优化版本(SAIL-Gen)进行比较,在稀疏间隙约束条件下,SGPM-SAI的性能要显著优于SAIL-Gen算法.此外,为有效利用现代处理器的大规模并行处理单元,提出了并行优化后的算法Parallel SGPM-SAI.实验结果表明:Parallel SGPM-SAI算法的加速效果显著,且具有良好的并行可扩展性,能够充分利用现代众核处理器的高并行计算优势.
基金supported by the National Natural Science Foundation of China under Grant Nos.61003058,60933001the Fundamental Research Funds for the Central Universities under Grant No.N090104001
文摘In recent years, there has been a growing need for complex event processing (CEP), ranging from supply chain management to security monitoring. In many scenarios events are generated in different sources but arrive at the central server out of order, due to the differences of network latencies. Most state-of-the-art techniques process out-of-order events by buffering the events until the total event order within a specified range can be guaranteed. Their main problems are leading to increasing response time and reducing system throughput. This paper aims to build a high performance out-of- order event processing mechanism, which can match events as soon as they arrive instead of buffering them till all arrive. A suffix-automaton-based event matching algorithm is proposed to speed up query processing, and a confidence-based accuracy evaluation is proposed to control the query result quality. The performance of our approach is evaluated through detailed accuracy and response time analysis. As experimental results show, our approach can obviously speed up the query matching time and produce reasonable query results.