期刊文献+

一种全文索引的压缩方法

A Compressed Suffix Automaton for Full Text Indexing
原文传递
导出
摘要 全文索引广泛应用于数据库、数据压缩、模式匹配算法以及信息生物学等领域。本文研究了后缀自动机全文索引结构,针对后缀自动机空间占用大的问题提出了一种边压缩方法。该方法通过后缀链接函数模拟实现自动机的跳转边,从而删除部分跳转边。在最终的压缩结构中,跳转边的数量与状态数量一致,而在后缀自动机中跳转边的数量是状态数量的一倍。证明了对于因子判定等问题,压缩的后缀自动机与后缀自动机具有相同的时间复杂度。 Full text indexes are widely used in areas such as data base,data compression,pattern matching and bioinformatics.We present in this paper a compression method for suffix automata.By deleting some transaction edges,the suffix automata can still work like the original suffix automata without losing performance.The compressed suffix automata have edges with the number similar with that of states while in the original ones the number of edges is twice of that of states.We also proved that using the compressed suffix automata the membership problem for the factor set of a given word can be solved linear time.
作者 杨炜鸿 张猛
出处 《情报科学》 CSSCI 北大核心 2010年第11期1710-1713,共4页 Information Science
基金 国家自然科学基金项目(60873235) 教育部中央高校基本科研业务费(200903186) 吉林省科技厅自然基金项目(20101522) 吉林省教育厅项目(2009599 2010400)
关键词 文本索引 后缀自动机 压缩 full text index suffix automaton compression
  • 相关文献

参考文献16

  • 1P. Weiner. Linear pattern matching algorithm [C]. USA:In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, IEEE,1973.
  • 2E. Ukkonen. On-line construction of suffix trees[J]. Algorithmica, Springer-Vedag, Germany, 1995,(14):249-60.
  • 3D. Gusfield. Algorithms on Strings Trees and Sequences[M]. New York:Cambridge University Press,1997:87-208.
  • 4A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M.T. Chen, and J. Seiferas. The smallest automation recognizing the subwords of a text[J]. Theoretical Computer Science, Elsevier, Holland, 1985, (40):31-55.
  • 5A. Blumer, J. Blumer, D. Haussler, R. McConnell, and A. Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis [J]. Journal of the ACM, ACM, USA, 1987, 34(3):578-595.
  • 6M. Crochemore. Transducers and repetitions [J]. Theoretical Computer Science, Elsevier, Holland, 1986,(45):63-86.
  • 7U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches [J]. SIAM Journal on Computing, ACM, USA, 1993, (22):935-948.
  • 8R. Grossi and J. Vitter. Compressed suffix arrays and suffix trees with applieafions to text indexing and string matching [C]. USA:In Proceedings of the 32nd ACM Symposium on Theory of Computing, ACM, 2000.
  • 9M. Crochemore and R. Verin. Direct construction of compact directed acyclic word graphs. In Combinatorial Pattern Matching[C]. Germany: Springer, 1997.
  • 10J. Karkkainen. Suffix cactus : a cross between suffix tree and suffix array[C]. Combinatorial Pattern Matching, 1995.

二级参考文献16

  • 1Charras C, Lecroq TT. Handbook of Exact String Matching Algorithms. London: King's College London Publications, 2004.
  • 2Knuth DE, Jr. Morris JH, Pratt VR. Fast pattern matching in strings. SIAM Journal on Computing, 1977,6(1):323-350.
  • 3Baeza-Yates RA, Gonnet GH. A new approach to text searching. Communications of the ACM, 1992,35(10):74-82.
  • 4Boyer RS, Moore JS. A fast string searching algorithm. Communications of the ACM, 1977,20(10):762-772.
  • 5Sunday DM. A very fast substring search algorithm. Communications of the ACM, 1990,33(8):132-142.
  • 6Cantone D, Faro S. Fast-Search: A new efficient variant of the Boyer-Moore string matching algorithm. In: Alberto A, Massimo M, eds. Proc. of the 2nd Int'l Workshop on Experimental and Efficient Algorithms (WEA 2003). Lecture Notes in Computer Science 2647, Heidelberg: Springer-Verlag, 2003.47-58.
  • 7Crochemore M, Rytter W. Text Algorithms. Oxford: Oxford University Press, 1994.
  • 8Lecroq T. A variation on the Boyer-Moore algorithm. Theoretical Computer Science, 1992,92(1):119-144.
  • 9Crochemore M, Hancart C. Automata for matching patterns. In: Rosenberg G, Salomaa A, eds. Handbook of Formal Languages,volume 2: Linear Modeling: Background and Application. Heidelberg: Springer-Verlag, 1997. 399-462.
  • 10Yao AC. The complexity of pattern matching for a random string. SIAM Journal on Computing, 1979,8(3):368-387.

共引文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部