In this paper, we show that for a locally LEW-embedded 3-connected graph G in orientable surface, the following results hold: 1) Each of such embeddings is minimum genus embedding; 2) The facial cycles are precisel...In this paper, we show that for a locally LEW-embedded 3-connected graph G in orientable surface, the following results hold: 1) Each of such embeddings is minimum genus embedding; 2) The facial cycles are precisely the induced nonseparating cycles which implies the uniqueness of such embeddings; 3) Every overlap graph O(G, C) is a bipartite graph and G has only one C-bridge H such that C U H is nonplanar provided C is a contractible cycle shorter than every noncontractible cycle containing an edge of C. This extends the results of C Thomassen's work on LEW-embedded graphs.展开更多
Background:De novo genome assembly relies on two kinds of graphs:de Bruijn graphs and overlap graphs.Overlap graphs are the basis for the Celera assembler,while de Bruijn graphs have become the dominant technical devi...Background:De novo genome assembly relies on two kinds of graphs:de Bruijn graphs and overlap graphs.Overlap graphs are the basis for the Celera assembler,while de Bruijn graphs have become the dominant technical device in the last decade.Those two kinds of graphs are collectively called assembly graphs.Results:In this review,we discuss the most recent advances in the problem of constructing,representing and navigating assembly graphs,focusing on very large datasets.We will also explore some computational techniques,such as the Bloom filter,to compactly store graphs while keeping all functionalities intact.Conclusions:We complete our analysis with a discussion on the algorithmic issues of assembling from long reads(eg.,PacBio and Oxford Nanopore).Finally,we present some of the most relevant open problems in this field.展开更多
The undirected de Bruijn graph is often used as the model of communication network for its useful properties, such as short diameter, small maximum vertex degree. In this paper, we consider the alphabet overlap graph ...The undirected de Bruijn graph is often used as the model of communication network for its useful properties, such as short diameter, small maximum vertex degree. In this paper, we consider the alphabet overlap graph G(k, d, s): the vertex set V = {v|v = (v1...vk); vi ∈ {1,2,... ,d}, i = 1,2,... ,k}; they are distinct and two vertices u = (ul...uk) and v = (vl... vk) are adjacent if and only if us+i = vi or vs+i = ui (i = 1,2,...,k - s). In particular, when s = 1, G(k,d,s) is just an undirected de Bruijn graph. First, we give a formula to calculate the vertex degree of G(k, d, s). Then, we use the corollary of Menger's theorem to prove that the connectivity of G(k, d, s) is 2d^s - 2d^2s-k for s ≥ k/2.展开更多
The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de nov...The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de novo assembly;nevertheless,its performance for transcriptome assembly is unclear.In this study,we used both simulated and real RNA-Seq data,from either artificial RNA templates or human transcripts,to evaluate five de novo assemblers,ABySS,Mira,Trinity,Velvet and Oases.Of these assemblers,ABySS,Trinity,Velvet and Oases are all based on de Bruijn graph,and Mira uses an overlap graph algorithm.Various numbers of RNA short reads were selected from the External RNA Control Consortium(ERCC) data and human chromosome 22.A number of statistics were then calculated for the resulting contigs from each assembler.Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate.Trinity had relative good performance for both ERCC and human data,but it may not consistently generate full length transcripts.ABySS was the fastest method but its assembly quality was low.Mira gave a good rate for mapping its contigs onto human chromosome 22,but its computational speed is not satisfactory.Our results suggest that transcript assembly remains a challenge problem for bioinformatics society.Therefore,a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.展开更多
A graph G is super-edge-connected,for short super-λ,if every minimum edge-cut consists of edges adjacent to a vertex of minimum degree.Alphabet overlap graph G(k,d,s)is undirected,simple graph with vertex set V={v...A graph G is super-edge-connected,for short super-λ,if every minimum edge-cut consists of edges adjacent to a vertex of minimum degree.Alphabet overlap graph G(k,d,s)is undirected,simple graph with vertex set V={v|v=1()kv…v;vi∈{1,2,…,d},i=1,…,k}.Two vertices u=(u1…uk)and v=(v1…vk)are adjacent if and only if us+i=vi or vs+i=ui(i=1,…,k-s).In particular G(k,d,1)is just an undirected de Bruijn graph.In this paper,we show that the diameter of G(k,d,s)is k s,the girth is 3.Finally,we prove that G(k,d,s)(s≥k/2)is super-λ.展开更多
The recent breakthroughs in next-generation sequencing technologies, such as those of Roche 454,Illumina/Solexa, and ABI SOLID, have dramatically reduced the cost of producing short reads of the genome of new species....The recent breakthroughs in next-generation sequencing technologies, such as those of Roche 454,Illumina/Solexa, and ABI SOLID, have dramatically reduced the cost of producing short reads of the genome of new species. The huge volume of reads, along with short read length, high coverage, and sequencing errors, poses a great challenge to de novo genome assembly. However, the paired-end information provides a new solution to these problems. In this paper, we review and compare some current assembly tools, including Newbler, CAP3, Velvet,SOAPdenovo, AllPaths, Abyss, IDBA, PE-Assembly, and Telescoper. In general, we compare the seed extension and graph-based methods that use the overlap/lapout/consensus approach and the de Bruijn graph approach for assembly. At the end of the paper, we summarize these methods and discuss the future directions of genome assembly.展开更多
基金Supported by NNSF of China(10271048,10671073)Supported by Science and Technology Commission of Shanghai Municipality(07XD14011)Supported by Shanghai Leading Academic Discipline Project(B407)
文摘In this paper, we show that for a locally LEW-embedded 3-connected graph G in orientable surface, the following results hold: 1) Each of such embeddings is minimum genus embedding; 2) The facial cycles are precisely the induced nonseparating cycles which implies the uniqueness of such embeddings; 3) Every overlap graph O(G, C) is a bipartite graph and G has only one C-bridge H such that C U H is nonplanar provided C is a contractible cycle shorter than every noncontractible cycle containing an edge of C. This extends the results of C Thomassen's work on LEW-embedded graphs.
文摘Background:De novo genome assembly relies on two kinds of graphs:de Bruijn graphs and overlap graphs.Overlap graphs are the basis for the Celera assembler,while de Bruijn graphs have become the dominant technical device in the last decade.Those two kinds of graphs are collectively called assembly graphs.Results:In this review,we discuss the most recent advances in the problem of constructing,representing and navigating assembly graphs,focusing on very large datasets.We will also explore some computational techniques,such as the Bloom filter,to compactly store graphs while keeping all functionalities intact.Conclusions:We complete our analysis with a discussion on the algorithmic issues of assembling from long reads(eg.,PacBio and Oxford Nanopore).Finally,we present some of the most relevant open problems in this field.
文摘The undirected de Bruijn graph is often used as the model of communication network for its useful properties, such as short diameter, small maximum vertex degree. In this paper, we consider the alphabet overlap graph G(k, d, s): the vertex set V = {v|v = (v1...vk); vi ∈ {1,2,... ,d}, i = 1,2,... ,k}; they are distinct and two vertices u = (ul...uk) and v = (vl... vk) are adjacent if and only if us+i = vi or vs+i = ui (i = 1,2,...,k - s). In particular, when s = 1, G(k,d,s) is just an undirected de Bruijn graph. First, we give a formula to calculate the vertex degree of G(k, d, s). Then, we use the corollary of Menger's theorem to prove that the connectivity of G(k, d, s) is 2d^s - 2d^2s-k for s ≥ k/2.
基金supported by grants from the National Center for Research Resources (5P20RR016471-12)the National Institute of General Medical Sciences (8 P20 GM103442-12) from the National Institutes of Healththe seed collaborative research grant from the Odegard School of Aerospace Sciences and the School of Medicine and Health Sciences at University of North Dakota
文摘The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de novo assembly;nevertheless,its performance for transcriptome assembly is unclear.In this study,we used both simulated and real RNA-Seq data,from either artificial RNA templates or human transcripts,to evaluate five de novo assemblers,ABySS,Mira,Trinity,Velvet and Oases.Of these assemblers,ABySS,Trinity,Velvet and Oases are all based on de Bruijn graph,and Mira uses an overlap graph algorithm.Various numbers of RNA short reads were selected from the External RNA Control Consortium(ERCC) data and human chromosome 22.A number of statistics were then calculated for the resulting contigs from each assembler.Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate.Trinity had relative good performance for both ERCC and human data,but it may not consistently generate full length transcripts.ABySS was the fastest method but its assembly quality was low.Mira gave a good rate for mapping its contigs onto human chromosome 22,but its computational speed is not satisfactory.Our results suggest that transcript assembly remains a challenge problem for bioinformatics society.Therefore,a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.
文摘A graph G is super-edge-connected,for short super-λ,if every minimum edge-cut consists of edges adjacent to a vertex of minimum degree.Alphabet overlap graph G(k,d,s)is undirected,simple graph with vertex set V={v|v=1()kv…v;vi∈{1,2,…,d},i=1,…,k}.Two vertices u=(u1…uk)and v=(v1…vk)are adjacent if and only if us+i=vi or vs+i=ui(i=1,…,k-s).In particular G(k,d,1)is just an undirected de Bruijn graph.In this paper,we show that the diameter of G(k,d,s)is k s,the girth is 3.Finally,we prove that G(k,d,s)(s≥k/2)is super-λ.
基金supported in part by the National Natural Science Foundation of China (Nos.61232001,61128006,and 61073036)
文摘The recent breakthroughs in next-generation sequencing technologies, such as those of Roche 454,Illumina/Solexa, and ABI SOLID, have dramatically reduced the cost of producing short reads of the genome of new species. The huge volume of reads, along with short read length, high coverage, and sequencing errors, poses a great challenge to de novo genome assembly. However, the paired-end information provides a new solution to these problems. In this paper, we review and compare some current assembly tools, including Newbler, CAP3, Velvet,SOAPdenovo, AllPaths, Abyss, IDBA, PE-Assembly, and Telescoper. In general, we compare the seed extension and graph-based methods that use the overlap/lapout/consensus approach and the de Bruijn graph approach for assembly. At the end of the paper, we summarize these methods and discuss the future directions of genome assembly.