Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcrip...Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcriptome,it still presents special challenges.To handle these difficulties and reconstruct transcriptome as completely as possible,current computational approaches mainly employ two strategies:de novo assembly and genome-guided assembly.In order to find the similarities and differences between them,we firstly chose five representative assemblers belonging to the two classes respectively,and then investigated and compared their algorithm features in theory and real performances in practice.We found that all the methods can be reduced to graph reduction problems,yet they have different conceptual and practical implementations,thus each assembly method has its specific advantages and disadvantages,performing worse than others in certain aspects while outperforming others in anther aspects at the same time.Finally we merged assemblies of the five assemblers and obtained a much better assembly.Additionally we evaluated an assembler using genome-guided de novo assembly approach,and achieved good performance.Based on these results,we suggest that to obtain a comprehensive set of recovered transcripts,it is better to use a combination of de novo assembly and genome-guided assembly.展开更多
Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to ob...Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to obtain accurate and comprehensive results. Here we reviewed the strategies for improving diverse transcriptomic studies and the annotation of genetic variants based on RNA-seq data. Mapping RNA-seq reads to the genome and transcriptome represent two distinct methods for quantifying the expression of genes/transcripts. Besides the known genes annotated in current databases, many novel genes/transcripts(especially those long noncoding RNAs) still can be identified on the reference genome using RNA-seq. Moreover, owing to the incompleteness of current reference genomes, some novel genes are missing from them. Genome-guided and de novo transcriptome reconstruction are two effective and complementary strategies for identifying those novel genes/transcripts on or beyond the reference genome. In addition, integrating the genes of distinct databases to conduct transcriptomics and genetics studies can improve the results of corresponding analyses.展开更多
基金supported by the National Basic Research Program of China (2010CB945401)the National Natural Science Foundation of China (31240038, 31171264, 31071162, 31000590)the Science and Technology Commission of Shanghai Municipality (11DZ2260300)
文摘Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcriptome,it still presents special challenges.To handle these difficulties and reconstruct transcriptome as completely as possible,current computational approaches mainly employ two strategies:de novo assembly and genome-guided assembly.In order to find the similarities and differences between them,we firstly chose five representative assemblers belonging to the two classes respectively,and then investigated and compared their algorithm features in theory and real performances in practice.We found that all the methods can be reduced to graph reduction problems,yet they have different conceptual and practical implementations,thus each assembly method has its specific advantages and disadvantages,performing worse than others in certain aspects while outperforming others in anther aspects at the same time.Finally we merged assemblies of the five assemblers and obtained a much better assembly.Additionally we evaluated an assembler using genome-guided de novo assembly approach,and achieved good performance.Based on these results,we suggest that to obtain a comprehensive set of recovered transcripts,it is better to use a combination of de novo assembly and genome-guided assembly.
基金supported by the National High Technology Research and Development Program of China(2015AA020104)the China Human Proteome Project(2014DFB30010)+1 种基金the National Science Foundation of China(31471239,to Leming Shi)the 111 Project(B13016)
文摘Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to obtain accurate and comprehensive results. Here we reviewed the strategies for improving diverse transcriptomic studies and the annotation of genetic variants based on RNA-seq data. Mapping RNA-seq reads to the genome and transcriptome represent two distinct methods for quantifying the expression of genes/transcripts. Besides the known genes annotated in current databases, many novel genes/transcripts(especially those long noncoding RNAs) still can be identified on the reference genome using RNA-seq. Moreover, owing to the incompleteness of current reference genomes, some novel genes are missing from them. Genome-guided and de novo transcriptome reconstruction are two effective and complementary strategies for identifying those novel genes/transcripts on or beyond the reference genome. In addition, integrating the genes of distinct databases to conduct transcriptomics and genetics studies can improve the results of corresponding analyses.