期刊文献+

时空高效的允许插入空位的short-read比对 被引量:1

Time-space Efficient Short-read Alignment with Inserting Gaps
下载PDF
导出
摘要 short-read alignment(短序列比对)在下一代测序技术中得到广泛运用.精确识别测序序列中的gap(空位)是后续基因组解读的基础,而现有的允许空位的short-read比对算法效果并不理想或者不允许插入空位.对于查询序列和参考序列均为short reads的比对问题,通过采取训练查询序列样本数据寻找不同物种和不同read长度匹配的最优插入空位数量的策略,对大规模的short reads进行两两比对,以减少算法的迭代次数,从而减少算法所需的中间矩阵计算量,并用向量存储算法比对过程中的中间矩阵元素值,以降低存储空间需求,提出一种改进的short-read比对算法.数千万的short reads对准实验结果表明:与已有的有代表性的同类算法相比,本文算法在确保short-read比对精确度的前提下,降低了所需的运行时间和存储空间. Short-read alignment is applied widely in next-generation sequencing technology.Accurate identification of gaps is the basis for subsequent genome interpretation.The existing algorithms allowing insertion of gaps varies significantly and many performs poorly or not allows insertion of gaps at all.For sequence alignment problem that both query and reference sequences are short reads,this article,performs pairwise sequence alignment for millions of that reads by training query sequence sample data in order to find different species and the optimal number of inserting gaps matched with reads of different length.The improved short-read alignment algorithm can reduce the number of iterations of the algorithm to reduce the computation of the intermediate matrix and use vector to store intermediate matrix elements in the process to reduce storage space.The results for large-scale of short reads show that compared to the existing algorithms,the presented algorithm can improve the alignment accuracy and reduce the execution time and required memory space.
作者 杨永洁 钟诚 YANG Yong-jie;ZHONG Cheng(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第5期1004-1009,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61462005)资助 广西自然科学基金项目(2014GXNSFAA118396)资助
关键词 short-read比对 双序列比对 动态规划 gap识别 short-read alignment pairwise sequence alignments dynamic programming gap identification
  • 相关文献

参考文献2

二级参考文献80

  • 1Kawarabayasi Y, Hino Y, Horikawa H, et al. Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix KI[J]. DNA Research, 1999, 6(2): 83-101,145-152.
  • 2Ng WV, Kennedy SP, Mahairas GG, et al. Genome sequence of Halobacterium species NRC- 1 [J]. Proceedings of the National Academy of Sciences, 2000, 97(22): 12176-12181.
  • 3Bolotin A, Wincker P, Mauger S, et al. The complete genome sequence of the lactic acid bacterium Lactococcus lactis ssp. lactis IL1403[J]. Genome Research, 2001, 11(5) 731-753.
  • 4Parkhill J, Wren BW, Thomson NR, et al. Genome sequence of Yersinia pestis, the causative agent of plague[J]. Nature, 2001, 413(6855): 523-527.
  • 5Shimizu T, Ohtani K, Hirakawa H, et al. Complete genome sequence of Clostridium perfringens, an anaerobic flesh-eater[J]. Proceedings of the National Academy of Sciences, 2002, 99(2): 996-1001.
  • 6English AC, Richards S, Han Y, et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology[J]. PLoS One, 2012, 7(11): e47768. DOI: 10.1371/journal.pone.0047768.
  • 7Tang B, Wang Q, Yang M, et al. ContigScape: a Cytoscape plugin facilitating microbial genome gap closing[J]. BMC Genomics, 2013, 14: 289. DOI: 10.1186/1471-2164-14-289.
  • 8Fraser CM, Eisen JA, Nelson KE, et al. The value of complete microbial genome sequencing (you get what you pay for)[J]. Journal of Bacteriology, 2002, 184(23): 6403-6405.
  • 9Nagarajan N, Cook C, Di Bonaventura M, et al. Finishing genomes with limited resources: lessons from an ensemble of microbial genomes[J]. BMC Genomics, 2010, 11: 242. DOI: 10.1186/1471-2164-11-242.
  • 10Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly[J]. Nature Methods, 2011, 8(1): 61-65.

共引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部