期刊文献+

一种带有长度和位置约束的字符串索引方法

A String Collection Indexing Method with String Length and Position Constraint
下载PDF
导出
摘要 提出了一种基于BWT(Burrows-wheeler-transform)的字符串集合的索引方法,以解决带有匹配字符串长度和匹配子串位置约束的子串确切匹配查找问题.讨论了BWT和基于BWT索引进行确切子串查找的基本原理.分析了字符串集合、匹配字符串长度和匹配子串位置约束对原BWT索引的影响.重点解决了快速地从匹配后缀位置到字符串ID和匹配子串位置的计算问题.在3个真实的数据集上进行了比对实验,结果表明:所提出的基于BWT索引方法在没有增加原索引大小的情况下,大大提升了带有匹配字符串长度和匹配位置约束的确切子串的查找的性能,因此该算法更加适用于大规模的字符串集合的索引进行近似字符串匹配和连接. An index method of string collection was proposed based on BWT( Burrows-wheelertransform)for solving the exact substring queries with string length and matching position constraints. Firstly,the BWT and exact string query based on it were discussed. Then the impact of string collection,string length and substring position upon the original BWT index was analyzed. Finally,the fast calculation problem was discussed and solved from the position of the matching suffix to the string ID and position on the string of the matching substring. The approximate string matching was conducted on three real string collections and compared the results of index method proposed and the original one. The experimental results showed that the method proposed based on BWT speeded up the process of exact substring queries with string length and matching position constraints considerably in the case of keeping the index size.Therefore,the proposed method was suitable for indexing large-scale string collection for string similarity match and joint.
作者 于长永 高明 柏禄一 赵宇海 YU Chang-yong;GAO Ming;BAI Lu-yi;ZHAO Yu-hai(School of Computer and Communication Engineering,Northeastern University at Qinhuangdao,Qinhuangdao 066004,China;School of Computer Science & Engineering,Northeastern University,Shenyang 110169,China.)
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2018年第7期959-963,共5页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(61772124 61332014 61401080 61402087) 河北省自然科学基金资助项目(F2015501049) 河北省教育厅项目(QN2014339) 中央高校基本科研业务费专项资金资助项目(N150402002)
关键词 BWT 字符串索引 倒排链表 字符串近似匹配 序列比对 BWT (Burrows-Wheeler-transform) string index inverted list string similarity match sequence alignment
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部