Because the small CACHE size of computers, the scanning speed of DFA based multi-pattern string-matching algorithms slows down rapidly especially when the number of patterns is very large. For solving such problems, w...Because the small CACHE size of computers, the scanning speed of DFA based multi-pattern string-matching algorithms slows down rapidly especially when the number of patterns is very large. For solving such problems, we cut down the scanning time of those algorithms (i.e. DFA based) by rearranging the states table and shrinking the DFA alphabet size. Both the methods can decrease the probability of large-scale random memory accessing and increase the probability of continuously memory accessing. Then the hitting rate of the CACHE is increased and the searching time of on the DFA is reduced. Shrinking the alphabet size of the DFA also reduces the storage complication. The AC++algorithm, by optimizing the Aho-Corasick (i.e. AC) algorithm using such methods, proves the theoretical analysis. And the experimentation results show that the scanning time of AC++and the storage occupied is better than that of AC in most cases and the result is much attractive when the number of patterns is very large. Because DFA is a widely used base algorithm in may string matching algorithms, such as DAWG, SBOM etc., the optimizing method discussed is significant in practice.展开更多
文摘Because the small CACHE size of computers, the scanning speed of DFA based multi-pattern string-matching algorithms slows down rapidly especially when the number of patterns is very large. For solving such problems, we cut down the scanning time of those algorithms (i.e. DFA based) by rearranging the states table and shrinking the DFA alphabet size. Both the methods can decrease the probability of large-scale random memory accessing and increase the probability of continuously memory accessing. Then the hitting rate of the CACHE is increased and the searching time of on the DFA is reduced. Shrinking the alphabet size of the DFA also reduces the storage complication. The AC++algorithm, by optimizing the Aho-Corasick (i.e. AC) algorithm using such methods, proves the theoretical analysis. And the experimentation results show that the scanning time of AC++and the storage occupied is better than that of AC in most cases and the result is much attractive when the number of patterns is very large. Because DFA is a widely used base algorithm in may string matching algorithms, such as DAWG, SBOM etc., the optimizing method discussed is significant in practice.