Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design effic...Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design efficient multiple pattern searching algorithms on multi-core architectures.We observe an important feature which indicates that the multiple pattern matching time mainly depends on the number and minimal length of patterns.The multi-core algorithm proposed in this paper leverages this feature to decompose pattern set so that the parallel execution time is minimized.We formulate the problem as an optimal decomposition and scheduling of a pattern set,then propose a heuristic algorithm,which takes advantage of dynamic programming and greedy algorithmic techniques,to solve the optimization problem.Experimental results suggest that our decomposition approach can increase the searching speed by more than 200% on a 4-core AMD Barcelona system.展开更多
Massive machine type communication(m MTC) is one of the key application scenarios for the fifth generation mobile communication(5 G). Grant-free(GF) transmission can reduce the high signaling overhead in m MTC. Non-or...Massive machine type communication(m MTC) is one of the key application scenarios for the fifth generation mobile communication(5 G). Grant-free(GF) transmission can reduce the high signaling overhead in m MTC. Non-orthogonal multiple access(NMA) can support more users for m MTC than orthogonal frequency division multiple access(OFDMA). Applying GF transmission in NMA system becomes an active topic recently. The in-depth study on applying GF transmission in pattern division multiple access(PDMA), a competitive candidate scheme of NMA, is investigated in this paper. The definition, latency and allocation of resource and transmission mechanism for GF-PDMA are discussed in detail. The link-level and system-level evaluations are provided to verify the analysis. The analysis and simulation results demonstrate that the proposed GF-PDMA has lower latency than grant based PDMA(GB-PDMA), possesses strong scalability to confront collision and provides almost 2.15 times gain over GF-OFDMA in terms of supporting the number of active users in the system.展开更多
To assess whether a development strategy will be profitable enough,production forecasting is a crucial and difficult step in the process.The development history of other reservoirs in the same class tends to be studie...To assess whether a development strategy will be profitable enough,production forecasting is a crucial and difficult step in the process.The development history of other reservoirs in the same class tends to be studied to make predictions accurate.However,the permeability field,well patterns,and development regime must all be similar for two reservoirs to be considered in the same class.This results in very few available experiences from other reservoirs even though there is a lot of historical information on numerous reservoirs because it is difficult to find such similar reservoirs.This paper proposes a learn-to-learn method,which can better utilize a vast amount of historical data from various reservoirs.Intuitively,the proposed method first learns how to learn samples before directly learning rules in samples.Technically,by utilizing gradients from networks with independent parameters and copied structure in each class of reservoirs,the proposed network obtains the optimal shared initial parameters which are regarded as transferable information across different classes.Based on that,the network is able to predict future production indices for the target reservoir by only training with very limited samples collected from reservoirs in the same class.Two cases further demonstrate its superiority in accuracy to other widely-used network methods.展开更多
The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It i...The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It is proved by experiment that the algorithm has three features, its construction process is quick, its cost of memory is small. At the same time, its searching process is as quick as the traditional algorithm. The algorithm is suitable for the application which requires preprocessing the patterns dynamically.展开更多
The backreaming operation plays a significant role in safe drilling for horizontal wellbores, while it may cause severe stuck pipe accidents. To lower the risk of the stuck pipe in backreaming operations, the mechanis...The backreaming operation plays a significant role in safe drilling for horizontal wellbores, while it may cause severe stuck pipe accidents. To lower the risk of the stuck pipe in backreaming operations, the mechanism of cuttings transport needs to be carefully investigated. In this research, a transient cuttings transport with multiple flow patterns model is developed to predict the evolution of cuttings transported in the annulus while backreaming. The established model can provide predictions of the distribution of cuttings bed along the wellbore considering the bulldozer effect caused by large-size drilling tools(LSDTs). The sensitivity analyses of the size of LSDTs, and backreaming operating parameters are conducted in Section 4. And a new theory is proposed to explain the mechanism of cuttings transport in the backreaming operation, in which both the bit and LSDTs have the “cleaning effect” and “plugging effect”.The results demonstrate that the cuttings bed in annuli is in a state of dynamic equilibrium, but the overall trend and the distribution pattern are obvious. First, larger diameters and longer drilling tools could lead to a higher risk of the stuck pipe. Second, we find that it is not the case that the higher flow rate is always better for hole cleaning, so three flow-rate intervals are discussed separately under the given conditions. When the “dangerous flow rate”(<33 L/s in Case 4) is employed, the cuttings bed completely blocks the borehole near the step surface and causes a stuck pipe directly. If the flow rate increases to the “low flow rate” interval(33-35 L/s in Case 4), a smaller flow rate instead facilitates borehole cleaning. If the flow rate is large enough to be in the “high flow rate” interval(>35 L/s in Case 4),the higher the flow rate, the better the cleaning effect of cuttings beds. Third, an interval of tripping velocity called “dangerous velocity” is proposed, in which the cuttings bed accumulation near the LSDTs is more serious than those of other tripping velocities. As long as the applied tripping velocity is not within the “dangerous velocity”(0.4-0.5 m/s in Case 5) interval in the backreaming operation, the risk of the stuck pipe can be controlled validly. Finally, through the factors analyses of the annular geometry,particle properties, and fluid properties in Section 5, it can be found that the “low flow rate”, “high flow rate” and “dangers flow rate” tend to decrease and the “dangerous velocity” tends to increase with the conditions more favorable for hole cleaning. This study has some guiding significance for risk prediction and parameter setting of the backreaming operation.展开更多
Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck ...Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.展开更多
Due to sensor malfunctions and communication faults,multiple missing patterns frequently happen in wastewater treatment process(WWTP).Nevertheless,the existing missing data imputation works cannot stand multiple missi...Due to sensor malfunctions and communication faults,multiple missing patterns frequently happen in wastewater treatment process(WWTP).Nevertheless,the existing missing data imputation works cannot stand multiple missing patterns because they have not sufficiently utilized of data information.In this article,a double-cycle weighted imputation(DCWI)method is proposed to deal with multiple missing patterns by maximizing the utilization of the available information in variables and instances.The proposed DCWI is comprised of two components:a double-cycle-based imputation sorting and a weighted K nearest neighbor-based imputation estimator.First,the double-cycle mechanism,associated with missing variable sorting and missing instance sorting,is applied to direct the missing values imputation.Second,the weighted K nearest neighbor-based imputation estimator is used to acquire the global similar instances and capture the volatility in the local region.The estimator preserves the original data characteristics as much as possible and enhances the imputation accuracy.Finally,experimental results on simulated and real WWTP datasets with non-stationarity and nonlinearity demonstrate that the proposed DCWI produces more accurate imputation results than comparison methods under different missing patterns and missing ratios.展开更多
Objectives:Genomic signatures like k-mers have become one of the most prominent approaches to describe genomic data.As a result,myriad real-world applications,such as the construction of de Bruijn graphs in genome ass...Objectives:Genomic signatures like k-mers have become one of the most prominent approaches to describe genomic data.As a result,myriad real-world applications,such as the construction of de Bruijn graphs in genome assembly,have been benefited by recognizing genomic signatures.In other words,an efficient approachof genomic signatureprofiling is an essential need for tackling high-throughput sequencing reads.However,most of the existing approaches only recognize fixed-size k-merswhile many research studies have shown the importance of considering variable-length k-mers.Methods:In this paper,we present a novel genomic signature profiling approach,TahcoRoll,by extending the Aho–Corasick algorithm(AC)for the task of profiling variable-length k-mers.We first group nucleotides into two clusters and represent each cluster with a bit.The rolling hash technique is further utilized to encode signatures and read patterns for efficient matching.Results:In extensive experiments,TahcoRoll significantly outperforms the most state-of-the-art k-mer counters and has the capability of processing reads across different sequencing platforms on a budget desktop computer.Conclusions:The single-thread version of TahcoRoll is as efficient as the eight-thread version of the state-of-the-art,JellyFish,while the eight-thread TahcoRoll outperforms the eight-thread JellyFish by at least four times.展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos.60803030,60925009,60921002the National Basic Research 973 Program of China under Grant No.2011CB302502
文摘Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design efficient multiple pattern searching algorithms on multi-core architectures.We observe an important feature which indicates that the multiple pattern matching time mainly depends on the number and minimal length of patterns.The multi-core algorithm proposed in this paper leverages this feature to decompose pattern set so that the parallel execution time is minimized.We formulate the problem as an optimal decomposition and scheduling of a pattern set,then propose a heuristic algorithm,which takes advantage of dynamic programming and greedy algorithmic techniques,to solve the optimization problem.Experimental results suggest that our decomposition approach can increase the searching speed by more than 200% on a 4-core AMD Barcelona system.
基金supported by National High Technology Research and Development Program of China (863 Program, No. 2015AA01A709)
文摘Massive machine type communication(m MTC) is one of the key application scenarios for the fifth generation mobile communication(5 G). Grant-free(GF) transmission can reduce the high signaling overhead in m MTC. Non-orthogonal multiple access(NMA) can support more users for m MTC than orthogonal frequency division multiple access(OFDMA). Applying GF transmission in NMA system becomes an active topic recently. The in-depth study on applying GF transmission in pattern division multiple access(PDMA), a competitive candidate scheme of NMA, is investigated in this paper. The definition, latency and allocation of resource and transmission mechanism for GF-PDMA are discussed in detail. The link-level and system-level evaluations are provided to verify the analysis. The analysis and simulation results demonstrate that the proposed GF-PDMA has lower latency than grant based PDMA(GB-PDMA), possesses strong scalability to confront collision and provides almost 2.15 times gain over GF-OFDMA in terms of supporting the number of active users in the system.
基金This work is supported by the National Natural Science Foundation of China under Grant 52274057,52074340 and 51874335the Major Scientific and Technological Projects of CNPC under Grant ZD2019-183-008+2 种基金the Major Scientific and Technological Projects of CNOOC under Grant CCL2022RCPS0397RSNthe Science and Technology Support Plan for Youth Innovation of University in Shandong Province under Grant 2019KJH002111 Project under Grant B08028.
文摘To assess whether a development strategy will be profitable enough,production forecasting is a crucial and difficult step in the process.The development history of other reservoirs in the same class tends to be studied to make predictions accurate.However,the permeability field,well patterns,and development regime must all be similar for two reservoirs to be considered in the same class.This results in very few available experiences from other reservoirs even though there is a lot of historical information on numerous reservoirs because it is difficult to find such similar reservoirs.This paper proposes a learn-to-learn method,which can better utilize a vast amount of historical data from various reservoirs.Intuitively,the proposed method first learns how to learn samples before directly learning rules in samples.Technically,by utilizing gradients from networks with independent parameters and copied structure in each class of reservoirs,the proposed network obtains the optimal shared initial parameters which are regarded as transferable information across different classes.Based on that,the network is able to predict future production indices for the target reservoir by only training with very limited samples collected from reservoirs in the same class.Two cases further demonstrate its superiority in accuracy to other widely-used network methods.
基金This project was supported by the National "863" High Technology Research and Development Program of China(2003AA142160) and the National Natural Science Foundation of China (60402019)
文摘The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It is proved by experiment that the algorithm has three features, its construction process is quick, its cost of memory is small. At the same time, its searching process is as quick as the traditional algorithm. The algorithm is suitable for the application which requires preprocessing the patterns dynamically.
基金the National Natural Science Foundation of China,China(Grant No.52227804,52174010)Strategic Cooperation Technology Projects of CNPC and CUPB,China(Grant No.ZLZX2020-01)+1 种基金Sinopec key laboratory of drilling completion and fracturing of shale oil and gas,China(Grant No.35800000-22-ZC0699-0004)the Key Projects of Scientific Research Plan in Colleges and Universities of Xinjiang Uygur Autonomous Region,China(Grant No.XJEDU20211028)。
文摘The backreaming operation plays a significant role in safe drilling for horizontal wellbores, while it may cause severe stuck pipe accidents. To lower the risk of the stuck pipe in backreaming operations, the mechanism of cuttings transport needs to be carefully investigated. In this research, a transient cuttings transport with multiple flow patterns model is developed to predict the evolution of cuttings transported in the annulus while backreaming. The established model can provide predictions of the distribution of cuttings bed along the wellbore considering the bulldozer effect caused by large-size drilling tools(LSDTs). The sensitivity analyses of the size of LSDTs, and backreaming operating parameters are conducted in Section 4. And a new theory is proposed to explain the mechanism of cuttings transport in the backreaming operation, in which both the bit and LSDTs have the “cleaning effect” and “plugging effect”.The results demonstrate that the cuttings bed in annuli is in a state of dynamic equilibrium, but the overall trend and the distribution pattern are obvious. First, larger diameters and longer drilling tools could lead to a higher risk of the stuck pipe. Second, we find that it is not the case that the higher flow rate is always better for hole cleaning, so three flow-rate intervals are discussed separately under the given conditions. When the “dangerous flow rate”(<33 L/s in Case 4) is employed, the cuttings bed completely blocks the borehole near the step surface and causes a stuck pipe directly. If the flow rate increases to the “low flow rate” interval(33-35 L/s in Case 4), a smaller flow rate instead facilitates borehole cleaning. If the flow rate is large enough to be in the “high flow rate” interval(>35 L/s in Case 4),the higher the flow rate, the better the cleaning effect of cuttings beds. Third, an interval of tripping velocity called “dangerous velocity” is proposed, in which the cuttings bed accumulation near the LSDTs is more serious than those of other tripping velocities. As long as the applied tripping velocity is not within the “dangerous velocity”(0.4-0.5 m/s in Case 5) interval in the backreaming operation, the risk of the stuck pipe can be controlled validly. Finally, through the factors analyses of the annular geometry,particle properties, and fluid properties in Section 5, it can be found that the “low flow rate”, “high flow rate” and “dangers flow rate” tend to decrease and the “dangerous velocity” tends to increase with the conditions more favorable for hole cleaning. This study has some guiding significance for risk prediction and parameter setting of the backreaming operation.
基金supported by China MOST project (No.2012BAH46B04)
文摘Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.
基金supported by the National Key Research and Development Project(Grant No.2018YFC1900800-5)the National Natural Science Foundation of China(Grant Nos.61890930-5,61903010,62021003 and 62125301)+1 种基金Beijing Natural Science Foundation(Grant No.KZ202110005009)Beijing Outstanding Young Scientist Program(Grant No.BJJWZYJH 01201910005020)。
文摘Due to sensor malfunctions and communication faults,multiple missing patterns frequently happen in wastewater treatment process(WWTP).Nevertheless,the existing missing data imputation works cannot stand multiple missing patterns because they have not sufficiently utilized of data information.In this article,a double-cycle weighted imputation(DCWI)method is proposed to deal with multiple missing patterns by maximizing the utilization of the available information in variables and instances.The proposed DCWI is comprised of two components:a double-cycle-based imputation sorting and a weighted K nearest neighbor-based imputation estimator.First,the double-cycle mechanism,associated with missing variable sorting and missing instance sorting,is applied to direct the missing values imputation.Second,the weighted K nearest neighbor-based imputation estimator is used to acquire the global similar instances and capture the volatility in the local region.The estimator preserves the original data characteristics as much as possible and enhances the imputation accuracy.Finally,experimental results on simulated and real WWTP datasets with non-stationarity and nonlinearity demonstrate that the proposed DCWI produces more accurate imputation results than comparison methods under different missing patterns and missing ratios.
文摘Objectives:Genomic signatures like k-mers have become one of the most prominent approaches to describe genomic data.As a result,myriad real-world applications,such as the construction of de Bruijn graphs in genome assembly,have been benefited by recognizing genomic signatures.In other words,an efficient approachof genomic signatureprofiling is an essential need for tackling high-throughput sequencing reads.However,most of the existing approaches only recognize fixed-size k-merswhile many research studies have shown the importance of considering variable-length k-mers.Methods:In this paper,we present a novel genomic signature profiling approach,TahcoRoll,by extending the Aho–Corasick algorithm(AC)for the task of profiling variable-length k-mers.We first group nucleotides into two clusters and represent each cluster with a bit.The rolling hash technique is further utilized to encode signatures and read patterns for efficient matching.Results:In extensive experiments,TahcoRoll significantly outperforms the most state-of-the-art k-mer counters and has the capability of processing reads across different sequencing platforms on a budget desktop computer.Conclusions:The single-thread version of TahcoRoll is as efficient as the eight-thread version of the state-of-the-art,JellyFish,while the eight-thread TahcoRoll outperforms the eight-thread JellyFish by at least four times.