摘要
随着新一代测序技术的发展,一些新的全基因组组装算法应运而生,特别是针对第三代高通量测序仪产生的海量短序列的组装软件被不断开发出来,这些组装软件渐渐走向市场。但是,由于这些组装软件的适用性和其性能的差别,选择一款性能优良的组装工具或者开发并行高吞吐的组装工具成为了当前面临的一大难题。本文选取基于De Bruijn图算法开发的4款De Novo组装的软件(Velvet、SOAPdenovo、IDBA、ABySS)对4种物种的基因组的模拟数据进行测试,并从软件的算法、组装性能和组装质量3个方面分析这4个软件的性能,同时根据其算法特点推断影响这些软件性能的关键因素,并给出软件的使用建议以及开发并行序列组装工具来组装超大规模的基因数据应该注意的问题。
Recently, new sequencing technologies have emerged, a new set of algorithms have been developed, and several assembly software packages have been created specifically for assembly of next-generation sequencing data. However, due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here we compare the performance between Velvet, SOAPdenovo, IDBA and ABYSS, which all are developed based on De Bruijn graph. We compare computational time, assembly accuracy and integrity, our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the development of existing assemblers.
出处
《科研信息化技术与应用》
2013年第5期58-69,共12页
E-science Technology & Application
基金
国家自然科学基金(11204342)
深圳市基础研究基金(JCY20120615140912201)
深圳市孔雀计划(KQCX20130628112914299)