期刊文献+

支持局部最优化匹配的近似子串查询算法 被引量:1

Approximate Substring Query Algorithms Supporting Local Optimal Matching
下载PDF
导出
摘要 基于编辑距离的字符串近似查询算法一般是先给定阈值k,然后计算那些与查询串的编辑距离小于或等于k的结果。但是对于近似子串查询,结果中有很多是交叠的,并且是无意义的,于是提出了一种局部最优化匹配的概念,只计算那些符合阈值条件,并且是局部最优的结果,这样不仅避免了结果的交叠,而且极大节省了时间开销。给出了支持局部最优化匹配的近似子串查询的定义,相应提出了一种基于gram索引的局部最优化近似子串查询算法,分析了子串近似匹配过程中的规律,研究了基于局部最优化匹配的边界限定和过滤策略,给出了一种过滤优化的局部最优化近似子串查询算法,提高了查询效率。 The algorithms of approximate string query based on edit distance have usually a given threshold k, and report those strings whose edit distances with the query are not bigger than k. However, when talking about approximate substring query, many answers got in this way have overlap, thus meaningless. So this paper proposes a concept of local optimal matching only computing those answers which are both not higher than k and local optimal, thus eliminating the overlapped answers and cutting the time cost. It also presents a definition of approximate substring query supporting local optimal matching along with a query algorithm based on gram index. Moreover, it analyzes the generality of the matching process, and studies the methods of filtering and limiting the boundary based on local optimal matching. Finally, it proposes an algorithm of local optimal approximate substring query with filtering, so that the efficiency of matching is promoted.
出处 《计算机科学与探索》 CSCD 2011年第9期769-780,共12页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金No.60973018 60973020 61173031 中央高校基本科研业务费专项资金No.N090504004 N100704001 N090104001~~
关键词 子串查询 模糊查询 gram索引 编辑距离 substring query fuzzy query gram index edit distance
  • 相关文献

参考文献16

  • 1Levenshtein V I. Binary codes capable of correcting spurious insertions and deletions of ones[J]. Problems of Information Transmission, 1965, 1(1): 8-17.
  • 2Sankoff D, Kruskal J. Time warps, string edits, and macromolecules: the theory and practice of sequence comparison[M]. Reading, UK: Addison-Wesley Publication, 1983.
  • 3Das G Fleisher R, Gasieniek L, et al. Episode matching[C]// LNCS 1264: Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching (CPM '97), Aarhus, Denmark, June 30-July 2, 1997. [S.1.]: Springer, 1997: 12-27.
  • 4Needleman S, Wunsch C. A general method applicable to the search for similarities in the amino acid sequences of two proteins[J]. Journal of Molecular Biology, 1970,48(3): 444-453.
  • 5Knuth D E, Morris J H, Pratt V R. Fast pattern matching in strings[J]. SIAM Journal on Computing, 1977, 6(2): 323-350.
  • 6Boyer R S, Moore J S. A fast string searching algorithm[J]. Communications of the ACM, 1977, 20(10): 762-772.
  • 7Kim Y, Woo K G, Park H, et al. Efficient processing of substring match queries with inverted q-gram indexes[C]// Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE '10). Washington, DC, USA: IEEE Computer Society, 2010: 721-732.
  • 8Navarro G. A guided tour to approximate string matching[J]. ACM Computing Surveys, 2001, 33(1): 31-88.
  • 9Kim M S, Whang K Y, Lee J G, et al. n-gram/2L: a space and time efficient two-level n-gram inverted index structure[C]//Proceedings of the 31st International Conference on Very Large Data Bases (VLDB '05), 2005: 325-336.
  • 10Zobel J, Moffat A. Inverted files for text search engines[J]. ACM Computing Surveys, 2006, 38(2).

同被引文献18

  • 1Navarro G,Baeza-Yates R.A hybrid indexing method for approximate stringmatching. Journal of Discrete Algorithms . 2000
  • 2Rao S,Raju K B,Viswanadha S,et al.Parallel string matching with multi core processors—A comparative study for gene sequences. Global Journal of Computer Science and Technology . 2013
  • 3Peter H Sellers.The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms . 1980
  • 4Manber Udi,Gene Myers.Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing . 1993
  • 5Navarro,Gonzalo,Ricardo Baeza-Yates.A practical q-gram index for text retrieval allowing errors. CLEI Electronic Journal . 1998
  • 6Heng Li,Richard Durb.Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics . 2009
  • 7Rodenas Pico D.Algorithms acceleration of pattern-matching in multi-core architectures. . 2011
  • 8Gonzalo Navarro,Veli M?kinen.??Compressed full-text indexes(J)ACM Computing Surveys (CSUR) . 2007 (1)
  • 9Wang J,Yang X,Wang B.Cache-aware parallel approximate matching and join algorithms using BWT. Proc of the Joint EDBT/ICDT 2013 Workshops . 2013
  • 10Navarro G,Baeza—Yates R,Sufinen E,Tarhio J.Indexing methods for approximate string matching. IEEE Data Enginering Bulletin . 2001

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部