期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
Energy Cost Minimization Using String Matching Algorithm in Geo-Distributed Data Centers
1
作者 Muhammad Imran Khan Khalil Syed Adeel Ali Shah +3 位作者 Izaz Ahmad Khan Mohammad Hijji Muhammad Shiraz Qaisar Shaheen 《Computers, Materials & Continua》 SCIE EI 2023年第6期6305-6322,共18页
Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due ... Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques. 展开更多
关键词 String matching OPTIMIZATION geo-distributed data centers geographical load balancing green energy
下载PDF
Surface reconstruction of complex contour lines based on chain code matching technique 被引量:1
2
作者 姜晓彤 《Journal of Southeast University(English Edition)》 EI CAS 2005年第4期432-435,共4页
A new method for solving the tiling problem of surface reconstruction is proposed. The proposed method uses a snake algorithm to segment the original images, the contours are then transformed into strings by Freeman'... A new method for solving the tiling problem of surface reconstruction is proposed. The proposed method uses a snake algorithm to segment the original images, the contours are then transformed into strings by Freeman' s code. Symbolic string matching technique is applied to establish a correspondence between the two consecutive contours. The surface is composed of the pieces reconstructed from the correspondence points. Experimental results show that the proposed method exhibits a good behavior for the quality of surface reconstruction and its time complexity is proportional to mn where m and n are the numbers of vertices of the two consecutive slices, respectively. 展开更多
关键词 chain code string matching surface reconstruction local shape feature
下载PDF
A Mathematical Solution to String Matching for Big Data Linking 被引量:1
3
作者 Kevin McCormack Mary Smyth 《Journal of Statistical Science and Application》 2017年第2期39-55,共17页
This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both t... This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here. 展开更多
关键词 Big Data Data Linking Identity Correlation Approach String matching Public Sector Datasets DataPrivacy.
下载PDF
A Novel Mathematical Model for Similarity Search in Pattern Matching Algorithms 被引量:1
4
作者 P. Vinod-Prasad 《Journal of Computer and Communications》 2020年第9期94-99,共6页
Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many... Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many other applications highly depend on pattern matching or similarity searches. In this paper, we discuss some of the string matching solutions developed in the past. Then, we present a novel mathematical model to search for a given pattern and it’s near approximates in the text. 展开更多
关键词 String matching Pattern matching Similarity Search Substring Search
下载PDF
Screen Content Coding with Primary and Secondary Reference Buffers for String Matching and Copying
5
作者 Tao Lin Kailun Zhou Liping Zhao 《ZTE Communications》 2015年第4期53-60,共8页
A screen content coding (SCC) algorithm that uses a primary reference buffer (PRB) and a secondary reference buffer (SRB) for string matching and string copying is proposed. PRB is typically the traditional reco... A screen content coding (SCC) algorithm that uses a primary reference buffer (PRB) and a secondary reference buffer (SRB) for string matching and string copying is proposed. PRB is typically the traditional reconstructed picture buffer which provides reference string pixels for the current pixels being coded. SRB stores a few of recently and frequently referenced pixels for repetitive reference by the current pixels being coded. In the encoder, searching of optimal reference string is performed in both PRB and SRB, and either a PRB or SRB string is selected as an optimal reference string on a string-by-string basis. Compared with HM-16.4+SCM-40 reference software, the proposed SCC algorithm can improve coding performance measured by bit-distortion rate reduction of average 4.19% in all-intra configuration for text and graphics with motion category' of test sequences defined by JCT-VC common test condition. 展开更多
关键词 HEVC hnage Coding Screen Content Coding String matching Video Coding
下载PDF
Fast algorithm on string cross pattern matching
6
作者 LiuGongshen LiJianhua LiShenghong 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2005年第1期179-186,共8页
Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplicat... Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplication of strings. This paper presents a fast string cross pattern matching algorithm based on extracting high frequency strings. Compared with existing algorithms including single-pattern algorithms and multi-pattern matching algorithms, this algorithm is featured by both low time complexity and low space complexity. Because Chinese alphabet is large and the average length of Chinese words is much short, this algorithm is more suitable to process the text written by Chinese, especially when the size of Σ is large and the number of strings is far more than the maximum length of strings of set U. 展开更多
关键词 pattern matching high frequency string string cross pattern matching.
下载PDF
Parallel Quick Search Algorithm for the Exact String Matching Problem Using OpenMP
7
作者 Sinan Sameer Mahmood Al-Dabbagh Nawaf Hazim Barnouti +1 位作者 Mustafa Abdul Sahib Naser Zaid G. Ali 《Journal of Computer and Communications》 2016年第13期1-11,共11页
String matching is seen as one of the essential problems in computer science. A variety of computer applications provide the string matching service for their end users. The remarkable boost in the number of data that... String matching is seen as one of the essential problems in computer science. A variety of computer applications provide the string matching service for their end users. The remarkable boost in the number of data that is created and kept by modern computational devices influences researchers to obtain even more powerful methods for coping with this problem. In this research, the Quick Search string matching algorithm are adopted to be implemented under the multi-core environment using OpenMP directive which can be employed to reduce the overall execution time of the program. English text, Proteins and DNA data types are utilized to examine the effect of parallelization and implementation of Quick Search string matching algorithm on multi-core based environment. Experimental outcomes reveal that the overall performance of the mentioned string matching algorithm has been improved, and the improvement in the execution time which has been obtained is considerable enough to recommend the multi-core environment as the suitable platform for parallelizing the Quick Search string matching algorithm. 展开更多
关键词 String matching Pattern matching String Searching ALGORITHMS Quick Search Algorithm Exact String matching Algorithm ? Parallelization OPENMP
下载PDF
A Fast Pattern Matching Algorithm Using Changing Consecutive Characters
8
作者 Amjad Hudaib Dima Suleiman Arafat Awajan 《Journal of Software Engineering and Applications》 2016年第8期399-411,共13页
Pattern matching is a very important algorithm used in many applications such as search engine and DNA analysis. They are aiming to find a pattern in a text. This paper proposes a Pattern Matching Algorithm Using Chan... Pattern matching is a very important algorithm used in many applications such as search engine and DNA analysis. They are aiming to find a pattern in a text. This paper proposes a Pattern Matching Algorithm Using Changing Consecutive Characters (PMCCC) to make the searching pro- cess of the algorithm faster. PMCCC enhances the shift process that determines how the pattern moves in case of the occurrence of the mismatch between the pattern and the text. It enhances the Berry Ravindran (BR) shift function by using m consecutive characters where m is the pattern length. The formal basis and the algorithms are presented. The experimental results show that PMCCC made enhancements in searching process by reducing the number of comparisons and the number of attempts. Comparing the results of PMCCC with other related algorithms has shown significant enhancements in average number of comparisons and average number of attempts. 展开更多
关键词 PATTERN Pattern matching Algorithms String matching Berry Ravindran EBR RS-A Fast Pattern matching Algorithms
下载PDF
Automatic Classification of Swedish Metadata Using Dewey Decimal Classification:A Comparison of Approaches 被引量:1
9
作者 Koraljka Golub Johan Hagelback Anders Ardo 《Journal of Data and Information Science》 CSCD 2020年第1期18-38,共21页
Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization syst... Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems.While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC.Design/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per class.The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task.Word embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation size.Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input.Stemming only marginally improves the results.Removed stop-words reduced accuracy in most cases,while removing less frequent words increased it marginally.The greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available in the training set,and 66.13%when too few records(often less than A Comparison of Approaches100 per class)on which to train are available—and these hold only for top 3 hierarchical levels(803 instead of 14,413 classes).Research limitations:Having to reduce the number of hierarchical levels to top three levels of DDC because of the lack of training data for all classes,skews the results so that they work in experimental conditions but barely for end users in operational retrieval systems.Practical implications:In conclusion,for operative information retrieval systems applying purely automatic DDC does not work,either using machine learning(because of the lack of training data for the large number of DDC classes)or using string-matching algorithm(because DDC characteristics perform well for automatic classification only in a small number of classes).Over time,more training examples may become available,and DDC may be enriched with synonyms in order to enhance accuracy of automatic classification which may also benefit information retrieval performance based on DDC.In order for quality information services to reach the objective of highest possible precision and recall,automatic classification should never be implemented on its own;instead,machine-aided indexing that combines the efficiency of automatic suggestions with quality of human decisions at the final stage should be the way for the future.Originality/value:The study explored machine learning on a large classification system of over 14,000 classes which is used in operational information retrieval systems.Due to lack of sufficient training data across the entire set of classes,an approach complementing machine learning,that of string matching,was applied.This combination should be explored further since it provides the potential for real-life applications with large target classification systems. 展开更多
关键词 LIBRIS Dewey Decimal Classification Automatic classification Machine learning Support Vector Machine Multinomial Naive Bayes Simple linear network Standard neural network 1D convolutional neural network Recurrent neural network Word embeddings String matching
下载PDF
Screen Content Coding in HEVC and Beyond
10
作者 LIN Tao ZHAO Liping ZHOU Kailun 《ZTE Communications》 2016年第B06期51-58,共8页
1 IntroductionThe screen content coding (SCC) standard [1] for high efficiency video coding (HEVC) is an international standard specially developed for screen content.
关键词 HEVC AVS Screen Content Coding String matching VIDEOCODING
下载PDF
Improving Classification Performance with Single-category Concept Match
11
作者 尹中航 Wang +4 位作者 Yongcheng Song Juping Cai Wei 《High Technology Letters》 EI CAS 2001年第4期20-22,共3页
Discarding more and more complicated algorithms, this paper presents a new classification algorithm with single category concept match. It also introduces the method to find such concepts, which is important to the al... Discarding more and more complicated algorithms, this paper presents a new classification algorithm with single category concept match. It also introduces the method to find such concepts, which is important to the algorithm. Experiment results show that it can improve classification precision and accelerate classification speed to some extent. 展开更多
关键词 Subject concept String match Information processing
下载PDF
A method for improving the accuracy of automatic indexing of Chinese-English mixed documents
12
作者 Yan ZHAO Hui SHI 《Chinese Journal of Library and Information Science》 2012年第4期77-92,共16页
Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chines... Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory,we proposed an integrated control method for indexing documents. It consists of 'feed-forward control','in-progress control' and 'feed-back control',aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents,the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from97.37% to 99.47%.Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce(BF) approach,the indexing efficiency has been reduced to some extent.Practical implications: The research is of both theoretical significance and practical value in improving the accuracy of automatic indexing of multilingual documents(not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.Originality/value: So far,few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents,especially Chinese-English mixed documents. 展开更多
关键词 Chinese-English mixed documents String matching Accuracy of automatic indexing CYBERNETICS Dedicated hepatitis B virus(HBV) database
下载PDF
Memory Efficient String Matching Algorithm for Network Intrusion Management System 被引量:9
13
作者 余建明 薛一波 李军 《Tsinghua Science and Technology》 SCIE EI CAS 2007年第5期585-593,共9页
As the core algorithm and the most time consuming part of almost every modern network intrusion management system (NIMS), string matching is essential for the inspection of network flows at the line speed. This pape... As the core algorithm and the most time consuming part of almost every modern network intrusion management system (NIMS), string matching is essential for the inspection of network flows at the line speed. This paper presents a memory and time efficient string matching algorithm specifically designed for NIMS on commodity processors. Modifications of the Aho-Corasick (AC) algorithm based on the distribution characteristics of NIMS patterns drastically reduce the memory usage without sacrificing speed in software implementations. In tests on the Snort pattern set and traces that represent typical NIMS workloads, the Snort performance was enhanced 1.48%-20% compared to other well-known alternatives with an automaton size reduction of 4.86-6.11 compared to the standard AC implementation. The results show that special characteristics of the NIMS can be used into a very effective method to optimize the algorithm design. 展开更多
关键词 string matching network intrusion management system (NIMS) Aho-Corasick (AC) algorithm
原文传递
Multi-Pattern Matching Algorithm with Wildcards Based on Bit-Parallelism
14
作者 Ahmed A. F. Saif HU Liang CHU Jianfeng 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2017年第2期178-184,共7页
Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem ca... Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem can be solved using finite automata. We introduce a multi-pattern matching algorithm with a fixed number of wildcards to overcome the high percentage of the occurrence of wildcards in patterns. In our proposed method, patterns are matched as bit patterns using a sliding window approach. The window is a bit window that slides along the given text, matching against stored bit patterns. Matching process is executed using bit wise operations. The experimental results demonstrate that the percentage of wildcard occurrence does not affect the proposed algorithm's performance and the proposed algorithm is more efficient than the algorithms based on the fast Fourier transform. The proposed algorithm is simple to implement and runs efficiently in O(n + d(n/σ )(m/w)) time, where n is text length, d is symbol distribution over k patterns, m is pattern length, and σ is alphabet size. 展开更多
关键词 multi-pattern string matching WILDCARD bitparallelism
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部