期刊文献+

基于长读数和多序列比对的间隙填充方法 被引量:1

Gap Filling Method Based on Long Reads and Multiple Sequences Alignment
下载PDF
导出
摘要 间隙(gap)填充方法有助于获取更加完整和准确的基因组序列,可以促进基因表达与调控、结构变异分析和物种进化的研究。虽然已有较多填充gap的方法被提出,但是填充的准确性和完整性仍有待提高。设计一种基于长读数和多序列比对的gap填充方法GapLM。将包含gap的序列集合切割成不含gap的序列集合,基于长读数和序列之间比对位置的差异对结果进行修正。通过分析比对确定覆盖每个gap区域的左侧、右侧和跨过3个序列集合。针对1个gap和其相关联的3个序列集合,采用多序列比对方法分别对3个集合中的序列进行处理和融合,并生成一致序列对gap区域进行填充。将GapLM与GMcloser、PBjelly、LR_Gapcloser 3种填充方法在2个真实数据集上进行比较,实验结果表明,GapLM具有更加完整和准确的填充结果。 Gap filling methods are helpful for obtaining more complete and accurate genome sequence,and thus assist in many studies on gene expression and regulation,structural variation analysis,and species evolution.Still,the accuracy and completeness of gap filling results of the existing methods need to be improved.In this paper,a gap filling method named GapLM is proposed based on long reads and multiple sequence alignment.This method splits the gap-containing sequence set into a gap-free sequence set.Then based on the difference between the aligning results of the long read and the contig,the aligning results are corrected.Through comparison and analysis,the left sequence set,the right sequence set and the spanning sequence set of each gap region are determined.For a gap and its associated three sequence sets,a multiple sequence alignment method is used to process and fuse the sequences in the three sets,and a consistent sequence is generated to fill the gap region.This method is tested on two real datasets in comparison with GMcloser,PBjelly,LR_Gapcloser.The experimental results show that this method produces more continuous and accurate filling results.
作者 毋东 魏亚伟 罗军伟 敖山 WU Dong;WEI Yawei;LUO Junwei;AO Shan(School of Computer Science and Technology,Henan Polytechnic University,Jiaozuo,Henan 454000,China)
出处 《计算机工程》 CAS CSCD 北大核心 2021年第11期93-99,107,共8页 Computer Engineering
基金 国家自然科学基金面上项目(61972134) 国家自然科学基金青年科学基金项目(61602156)。
关键词 gap填充 序列组装 第三代测序技术 多序列比对 长读数 gap filling sequence assembly third generation sequence technology multiple sequences alignment long reads
  • 相关文献

参考文献3

二级参考文献90

  • 1郑纬民,林皎,罗水华.DNA序列拼接中欧拉超路算法的新并行策略[J].计算机学报,2006,29(1):139-144. 被引量:2
  • 2骆志刚,方小永,丁凡.DNA序列拼接的研究进展及挑战[J].计算机工程与科学,2007,29(8):127-132. 被引量:5
  • 3Schadt EE, Turner S, Kasarskis A. A window into third- generation sequencing. Hum Mol Genet 2010;19:R227-40.
  • 4Travers K, Chin CS, Rank D, Eid J, Turner S. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res 2010;38:e159.
  • 5Pacific Biosciences. Media Kit, < http://www.pacb.com/company/news- events/media-resources/page/3/> (May 19, 2015, date last accessed).
  • 6Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science 2009;323:133-8.
  • 7AllSeq. Pacific Biosciences, <http://allseq.com/knowledgebank/ sequencing-platforms/pacific-biosciences> (April 14, 2015, date last accessed).
  • 8Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly, Curr Opin Microbiol 2015;23:110-20.
  • 9Brown S, Nagaraju S, Utturkar S, De Tissera S, Segovia S, Mitchell W, et al. Comparison of single-molecule sequencing and hybrid approaches for fnishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia. Biotechnol Biofuels 2014;7:40.
  • 10Pacific Biosciences. SMRT sequencing: read lengths, <http:// www.pacb.com/smrt-science/smrt-sequencing/read-lengths/> (October 3, 2015, date last accessed).

共引文献166

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部