期刊文献+

支持块编辑距离的索引结构 被引量:3

Index Structures for Supporting Block Edit Distance
下载PDF
导出
摘要 在近似字符串匹配中,传统的编辑距离不能很好地衡量诸如人名、地址等数据的相似关系,而块编辑距离可以很好地衡量两个字符串的相似性.如何有效地支持块编辑距离,进行近似字符串查询处理具有重要的意义.计算两个字符串的块编辑距离是一个NP完全问题,因此希望提供有效的方法可以增强过滤能力,并减少假通过率.设计了一种支持移动编辑距离的新颖的索引结构SHV-Trie,通过研究移动编辑距离的操作特性,使用字母出现的频率作为支持移动编辑距离操作的一个下界,并且提出相应的查询过滤算法,同时,针对索引SHV-Trie的空间开销过大的问题,提出一种优化字母排列的索引结构和一种压缩的索引结构及相关查询过滤算法.真实数据集上的实验结果与分析显示了所提出的索引结构具有良好的过滤能力,并通过减少效率假通过率提高查询的效率. In approximate string matching,the traditional edit distance cannot evaluate the similarity between strings very well,especially for the name,address datasets,etc. The block edit distance,however,can do the job easily. It is important to efficiently support block edit distance for approximate string query processing. Since computing the block edit distance between two strings is an NP-Complete problem,it is desired to provide solutions to increase filterability and decrease false positives. In this paper,a novel index structure,called SHV-Trie,is proposed. A lower bound of the block edit distances is presented according to the features of the block edit distance with move operations,i.e. the frequencies of each character in a string. A corresponding query filter approach is proposed based on the lower bound on character frequencies. Meanwhile,considering the large space cost problem,an optimized ordered character index structure and a compressed index structure are proposed. The corresponding query filtering approaches are further given based on the optimized and compressed index structures. The experimental results and analysis on real data sets show that the proposed index structures can provide good filtering ability and high query performance by decreasing false positives.
出处 《计算机研究与发展》 EI CSCD 北大核心 2010年第1期191-199,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60828004 60973018) 教育部新世纪优秀人才支持计划基金项目(NCET-06-0290) 中国人民大学数据与知识工程教育部重点实验室开放课题(2008002)
关键词 近似字符串匹配 块编辑距离 压缩 索引 NP完全问题 approximate string matching block edit distance compression index NP-complete problem
  • 相关文献

参考文献11

  • 1Shapira D, Storer J A. Edit distance with move operations [C]//LNCS2373: Proe of Combinatorial Pattern Matching. Berlin: Springer, 2002:85-98.
  • 2Shapira D, Storer J A. Edit distance with move operations [J]. Journal of Discrete Algorithms, 2007, 5(2) : 380-392.
  • 3Croehemore M, Rytter W. Text Algorithms [M]. UK: Oxford University Press, 1995.
  • 4Gusfield D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computation [M]. Cambridge: Cambridge University Press, 1997.
  • 5Crochemore M, Rytter W. Jewels of Stringology [M]. Singapore: World Scientific, 2002.
  • 6Navarro G. A guided tour to approximate string matching [J]. ACM Computing Surveys, 2001, 33(1): 31-88.
  • 7Lopresti D, Tomkins A. Block edit models for approximate string matching [J]. Theoretical Computer Science, 1997, 181(1): 159-179.
  • 8Graham C, Muthukrishnan S. The string edit distance matching problem with moves [C]. //Proc of ACM-SIAM Symp on Discrete Algorithms. New York: ACM, 2002: 667-676.
  • 9Graham C, Muthukrishnan S. The string edit distance matching problem with moves [J]. ACM Trans on Algorithms, 2007, 3(1): 2-21.
  • 10范洪博,姚念民.一种高速精确单模式串匹配算法[J].计算机研究与发展,2009,46(8):1341-1348. 被引量:14

二级参考文献10

  • 1Holub J,Durian B.Fast variants of bit parallel approach to suffix automata[OL]. http://www.cri.haifa.ac.il/events/2005/string/presentations/Holub.pdf . 2008
  • 2Allauzen C,,Crochemore M,Raffinot M.Factor oracle:A newstructure for pattern matching[].Proc of SOFSEM.1999
  • 3Navarro G,Raffinot M.Fast and flexible string matching by combining bit-parallelismand suffix automata[OL]. http://doi.acm.org/10.1145/351827.384246 . 2008
  • 4Peltola Hannu,Tarhio Jorma.Alternative algorithms for bit-parallel string matching[].Proc of SPIRE.2003
  • 5Knuth D E,Morris J H,Pratt V R.Fast pattern matching in string[].SIAM Journal on Computing.1977
  • 6Horspool R N.Practical fast searching in strings[].Software Practice and Experience.1980
  • 7Hume A,Sunday D M.Fast string searching[].Software -Practice &Experience.1991
  • 8Sheik S.S,Aggarwal Sumit K,Poddar Anindya.A fast pattern matching algorithm[].Journal of Chemistry.2004
  • 9K.Fredriksson,S. Grabowski.Practical and Optimal String Matching[].String Processing and Information Retrieval.2005
  • 10Lecroq T.Fast exact string matching algorithm[].Information Processing Letters.2007

共引文献13

同被引文献31

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部