摘要
Background:Next-generation sequencing (NGS) technologies have fostered an unprecedented proliferation of highthroughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads.However,numerous technical or computational challenges in de novo assembly still remain,although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings.Results:In this review,we first briefly introduce some of the major challenges faced by NGS sequence assembly.Then,we analyze the characteristics of various sequencing platforms and their impact on assembly results.After that,we classify de novo assemblers according to their frameworks (overlap graph-based,de Bruijn graph-based and string graph-based),and introduce the characteristics of each assembly tool and their adaptation scene.Next,we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data,single-cell sequencing data and single molecule sequencing data.At last,we discuss the application of SMS long reads in solving problems encountered in NGS assembly.Conclusions:This review not only gives an overview of the latest methods and developments in assembly algorithms,but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.
基金
the National Natural Science Foundation of China (Nos.61732009,61772557 and 61420106009)
111 Project (No.Bl8059)
the Fundamental Research Funds for the Central Universities of Central South University (No.1053320171177).