期刊文献+

SuperLLEC:全新的链读和长读测序组装纠错算法

SuperLLEC:New Assembly and Error Correction Algorithm for Long Reads and Linked-Reads
下载PDF
导出
摘要 为了解决第三代测序数据较高的错误率和提高基因组组装精度,基于10X Genomics链读测序数据设计了一种针对PacBio长读数据的组装和纠错算法SuperLLEC。该算法使用Wtdbg2算法将PacBio长读测序数据拼接成支架序列,运用Bowtie2比对工具将链读序列比对到支架序列,并根据链读条码进一步组装支架序列;对不匹配的比对位点采用Fisher精确检验预测该位点为单核酸多态性或是PacBio测序错误的碱基。通过三组人类细胞的长读数据和链读数据的算法比较实验,证明该方法能够较明显地提高基因组组装的准确度、NG50长度和单核酸多态性位点预测精度。 In order to solve the high error rate of the third-generation sequencing data and improve the accuracy of genome assembly,an assembly and error correction algorithm,called SuperLLEC,is designed for the long-read data of the PacBio based on the 10X Genomics linked-read sequencing data.Wtdbg2 is employed to assemble the PacBio long reads of a genome into scaffolds.Bowtie2 is used to align each linked-read to these scaffolds,and further assemble these scaffolds based on the barcodes of linked-reads.Fisher’s exact test is used to predict whether each mismatched alignment site is a single nucleotide polymorphism(SNP)or an error base sequenced by PacBio.Algorithm comparison experiments on the long-read and linked-read data from three groups of human cells show that SuperLLEC can significantly improve the accuracy of genome assembly,increase NG50 length,and recover more SNPs.
作者 崔雅轩 张少强 CUI Yaxuan;ZHANG Shaoqiang(College of Computer Information and Engineering,Tianjin Normal University,Tianjin 300387,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第3期201-206,共6页 Computer Engineering and Applications
基金 国家自然科学基金(61572358) 天津自然科学基金重点项目(19JCZDJC35100)。
关键词 链读 长读 支架 组装 纠错 FISHER精确检验 linked-reads long-reads scaffolds assembly error correction Fisher’s exact test
  • 相关文献

参考文献3

二级参考文献80

  • 1Schadt EE, Turner S, Kasarskis A. A window into third- generation sequencing. Hum Mol Genet 2010;19:R227-40.
  • 2Travers K, Chin CS, Rank D, Eid J, Turner S. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res 2010;38:e159.
  • 3Pacific Biosciences. Media Kit, < http://www.pacb.com/company/news- events/media-resources/page/3/> (May 19, 2015, date last accessed).
  • 4Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science 2009;323:133-8.
  • 5AllSeq. Pacific Biosciences, <http://allseq.com/knowledgebank/ sequencing-platforms/pacific-biosciences> (April 14, 2015, date last accessed).
  • 6Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly, Curr Opin Microbiol 2015;23:110-20.
  • 7Brown S, Nagaraju S, Utturkar S, De Tissera S, Segovia S, Mitchell W, et al. Comparison of single-molecule sequencing and hybrid approaches for fnishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia. Biotechnol Biofuels 2014;7:40.
  • 8Pacific Biosciences. SMRT sequencing: read lengths, <http:// www.pacb.com/smrt-science/smrt-sequencing/read-lengths/> (October 3, 2015, date last accessed).
  • 9Illumina. HiSeq 2500 specifications, < http://www.illumina.com/ systems/hiseq_2500_ 1500/performance_specifications.html > (April 14. 2015, date last accessed).
  • 10Myers G. PacBio AGBT 2015 live workshop, < http://blog.paci- ficbiosciences.com/2015/02/agbt-2015-1ive-streaming-pacbio-workshop. html > (October 10, 2015, date last accessed).

共引文献159

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部