期刊文献+

一种新颖的编辑距离限制下的相似性确认算法 被引量:2

A Novel Similarity Verification Algorithm Under Edit Distance Limitation
下载PDF
导出
摘要 针对相似性确认步骤中编辑距离计算的高复杂性问题,提出了一种在编辑距离限制下的基于鸽笼原理的字符串相似性确认算法.首先找到满足编辑距离片段映射的片段,以此片段为基准,将长度为500 bp的read分段.然后对满足编辑距离片段映射的左右部分递归地进行编辑距离计算,将各段得到的编辑距离相加即为最后结果.最后根据最长公共子串的下限将需要验证的片段数目降到最低,得到优化方案.实验结果表明,基于鸽笼原理的分段递归计算编辑距离的确认算法减少了验证步骤的时间,并能保证假阳率和假阴率都为零. For the high complexity problem of edit distance calculation in the similarity confirmation step,a string similarity confirmation algorithm based on pigonhole principle under the limitation of edit distance is proposed.Firstly,it found a segment satisfies the edit distance segment mapping,then based on this segment,the read length of 500 bp was segmented.Then the edit distance of right and left parts was calculated and all parts sum to the final result.Finally,based on the longest common substring,the number of segments to be verified is minimized,and an optimization scheme was obtained.The experimental result show that this verification algorithm reduces the time of the verification step,and ensures the zero false positive and false negative rate.
作者 于长永 李淼淼 赵楚 马海涛 YU Chang-yong;LI Miao-miao;ZHAO Chu;MA Hai-tao(School of Computer&Communication Engineering,Northeastern University at Qinhuangdao,Qinhuangdao 066004,China)
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2019年第11期1543-1548,共6页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(61772124,61401080,61402087) 河北省自然科学基金资助项目(F2015501049) 河北省教育厅项目(QN2014339) 中央高校基本科研业务费专项资金资助项目(N150402002)
关键词 读取映射 编辑距离 相似性查询 鸽笼原理 确认算法 read mapping edit distance similarity search pigonhole principle verification algorithm
  • 相关文献

参考文献1

二级参考文献8

  • 1Jokinen P, Ukkonen E. Two Algorithms for Approximate String Matching in Static Texts[M]. Mathematical Foundations of Computer Science 1991. Springer Berlin Heidelberg, 1991:240-248.
  • 2Burkhardt S, Crauser A, Ferragina P, et al. Q-gram Based Database Searching Using a Suffix Array ( QUASAR ) [C]. Proceedings of the Third Annual International Conference on Computational Molecular Biology. ACM,1999:77-83.
  • 3Gravano L, Ipeirotis P G, Jagadish H V, et al. Approximate String Joins in a Database(almost)for Free[C]. VLDB. 2001, 1:491-500.
  • 4Li C, Lu J, Lu Y. Efficient Merging and Filtering Algorithms for Approximate String Searches[C]. Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on. IEEE, 2008:257-266.
  • 5Xiao C, Wang W, Lin X. Ed-join: an Efficient Algorithm for Similarity Joins with Edit Distance Constraints[J]. Proceedings of the VLDB Endowment, 2008, 1 (1): 933-9d4.
  • 6Sutinen E, Tarhio J. On Using Q-gram Locations in Approximate String Matching[M]. Algorithms--ESA'95. Springer Berlin Heidelberg, 1995:327-340.
  • 7Califano A, Rigoutsos I. FLASH: A Fast Look-Up Algorithm for String Homology[C]. Computer Vision and Pattern Recognition, 1993. Proceedings CVPR'93., 1993 IEEE Computer Society Conference on. IEEE, 1993:353-359.
  • 8Kernighan B W, Ritchie D M. The C Programming Language[M]. Englewood Cliffs: Prentice-Hall, 1988.

共引文献3

同被引文献24

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部