Quinoa (Chenopodium quinoa Willd.) is a halophytic, allotetraploid grain crop of the Amaranthaceae family with impressive drought tolerance, nutritional content and an increasing worldwide market. Here we report the r...Quinoa (Chenopodium quinoa Willd.) is a halophytic, allotetraploid grain crop of the Amaranthaceae family with impressive drought tolerance, nutritional content and an increasing worldwide market. Here we report the results of an RNA-seq transcriptome analysis of Chenopodium quinoa using four water treatments (field capacity to drought) on the varieties “Ingapirca” (representing valley ecotypes) and “Ollague” (representing Altiplano Salares ecotypes). Physiological results, including growth rate, photosynthetic rate, stomatal conductance, and stem water potential, support the earlier findings that the Altiplano Salares ecotypes display greater tolerance to drought-like stress conditions than the valley ecotypes. cDNA libraries from root tissue samples for each variety × treatment combination were sequenced using Illumina Hi-Seq technology in an RNA-seq experiment. De novo assembly of the transcriptome generated 20,337 unique transcripts. Gene expression analysis of the RNA-seq data identified 462 putative gene products that showed differential expression based on treatment, and 27 putative gene products differentially expressed based on variety × treatment, including significant expression differences in root tissue in response to increasing water stress. BLAST searches and gene ontology analysis show an overlap between drought tolerance stress and other abiotic stress mechanisms.展开更多
Salt stress is an abiotic stress to plants in especially saline lakes.Dunaliella,a halophilic microalga distributed throughout salt lakes and seas,can respond to different salinity stresses by regulating the expressio...Salt stress is an abiotic stress to plants in especially saline lakes.Dunaliella,a halophilic microalga distributed throughout salt lakes and seas,can respond to different salinity stresses by regulating the expression of some genes.However,these genes and their function and biological processes involved remain unclear.Profi ling these salt-stress-related genes in a high-salt-tolerant Dunaliella species will help clarify the salt tolerance machinery of Dunaliella.Three D.salina_YC salt-stress groups were tested under low(0.51 mol/L),moderate(1.03 mol/L),and high(3.42 mol/L)NaCl concentrations and one control group under very low(0.05 mol/L)NaCl concentration and 3 transcriptome results that were deep sequenced and de novo assembled were obtained per group.Twelve high-quality RNA-seq libraries with 46585 upregulated and 47805 downregulated unigenes were found.Relative to the control,188 common differentially expressed genes(DEGs)were screened and divided into four clusters in expression pattern.Fifteen of them annotated in the significant enriched Gene Ontology(GO)and Kyoto Encyclopedia of Genes and Genomes(KEGG)were validated via qPCR.Their qPCR-based relative expression patterns were similar to their RNA-seq-based patterns.Two significant DEGs,the geranylgeranyl diphosphate synthase coding gene(1876-bp cDNA)and diacylglycerol O-acyltransferase coding gene(2968-bp cDNA),were cloned and analyzed in silico.The total lipid content,superoxide dismutase specific activity,and betacarotene content of D.salina_YC increased gradually with increasing salinity.In addition,the expression of 11 validated genes involved in fatty acid biosynthesis/degradation,active oxygen or carotenoid metabolisms showed significant changes.In addition,algal photochemical efficiency was diminished with increasing salinity,as well as the expression of 4 photosynthesis-related genes.These results could help clarify the molecular mechanisms underlying D.salina responses to the Yuncheng Salt Lake environment and lay a foundation for further utilization of this algal resource.展开更多
Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development period...Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development periods were collected for transcriptome analysis.High performance liquid chromatography analysis showed that CGA contents of KFP at 88,118 and 163 days after full bloom were(20.96±1.84),(12.01±0.91)and(7.16±0.41)mg/100 g,respectively,and decreased with the fruit development.Pears from these typical 3 periods were selected for de novo transcriptome assemble and 68059 unigenes were assembled from 444037960 clean reads.One‘phenylpropanoid biosynthesis’pathway including 57 unigenes,11 PALs,1 PTAL,64CLs,9 C4Hs,25 HCTs and 5 C3’Hs related to CGA biosynthesis was determined.It was found that the expression levels of 11 differentially expressed genes including 1 PAL,2 C4Hs,34CLs and 5 HCTs were consistent with the change of CGA content.Quantitative polymerase chain reaction analysis further showed that 8 unigenes involved in CGA biosynthesis were consistent with the RNA-seq data.These findings will provide a comprehensive understanding and valuable information on the genetic engineering and molecular breeding in KFP.展开更多
Superior inbred lines are central to maize breeding as sources of natural variation.Although many elite lines have been sequenced,less sequencing attention has been paid to newly developed lines.We constructed a genom...Superior inbred lines are central to maize breeding as sources of natural variation.Although many elite lines have been sequenced,less sequencing attention has been paid to newly developed lines.We constructed a genome assembly of the elite inbred line KA105,which has recently been developed by an arti-ficial breeding population named Shaan A and has shown desirable characteristics for breeding.Its pedigree showed genetic divergence from B73 and other lines in its pedigree.Comparison with the B73 reference genome revealed extensive structural variation,58 presence/absence variation(PAV)genes,and 1023 expanded gene families,some of which may be associated with disease resistance.A network-based integrative analysis of stress-induced transcriptomes identified 13 KA105-specific PAV genes,of which eight were induced by at least one kind of stress,participating in gene modules responding to stress such as drought and southern leaf blight disease.More than 200,000 gene pairs were differentially correlated between KA105 and B73 during kernel development.The KA105 reference genome and transcriptome atlas are a resource for further germplasm improvement and surveys of maize genomic variation and gene function.展开更多
Special xylem tissue called “compression wood” is formed on the lower side of inclined stems when gymnosperms grow on a slope. We investigated the molecular mechanism of compression wood formation. Transcriptome ana...Special xylem tissue called “compression wood” is formed on the lower side of inclined stems when gymnosperms grow on a slope. We investigated the molecular mechanism of compression wood formation. Transcriptome analysis by next-generation sequencing (NGS) was applied to the xylem of Chamaecyparis obtusa to develop a catalog of general gene expression in differentiating xylem during compression and normal wood formation. The sequencing output generated 234,924,605 reads and 40,602 contigs (mean size = 529 bp). Based on a sequence similarity search with known proteins, 54.2% (22,005) of the contigs showed homology with sequences in the databases. Of these annotated contigs, 19,293 contigs were assigned to Gene Ontology categories. Differential gene expression between the compression and normal wood libraries was analyzed by mapping the reads from each library to the assembled contigs. In total, 2875 contigs were identified as differentially expressed, including 1207 that were up-regulated and 1668 that were down-regulated in compression wood. We selected 30 genes and compared the transcript abundance between compression and normal wood by quantitative polymerase chain reaction analysis to validate the NGS results. We found that 27 of the 30 genes showed the same expression patterns as the original NGS results.展开更多
Panax ginseng C. A. Meyer is an important traditional herb in eastern Asia. It contains ginsenosides, which are primary bioactive compounds with medicinal properties. Although ginseng has been cultivated since at leas...Panax ginseng C. A. Meyer is an important traditional herb in eastern Asia. It contains ginsenosides, which are primary bioactive compounds with medicinal properties. Although ginseng has been cultivated since at least the Ming dynasty to increase production, cultivated ginseng has lower quantities of ginsenosides and lower disease resistance than ginseng grown under natural conditions. We extracted root RNA from six varieties of fifth-year P. ginseng cultivars representing four different growth conditions, and performed Illumina paired-end sequencing. In total, 163,165,706 raw reads were obtained and used to generate a de novo transcriptome that consisted of 151,763 contigs(76,336 unigenes), of which 100,648 contigs(66.3%) were successfully annotated. Differential expression analysis revealed that most differentially expressed genes(DEGs) were upregulated(246 out of 258, 95.3%) in ginseng grown under natural conditions compared with that grown under artificial conditions. These DEGs were enriched in gene ontology(GO) terms including response to stimuli and localization. In particular, some key ginsenoside biosynthesis-related genes, including HMG-Co A synthase(HMGS), mevalonate kinase(MVK), and squalene epoxidase(SE), were upregulated in wild-grown ginseng. Moreover, a high proportion of disease resistance-related genes were upregulated in wild-grown ginseng. This study is the first transcriptome analysis to compare wild-grown and cultivated ginseng, and identifies genes that may produce higher ginsenoside content and better disease resistance in the wild; these genes may have the potential to improve cultivated ginseng grown in artificial environments.展开更多
Nannochloropsis is rapidly emerging as a model organism for the study of biofuel production in microalgae.Here, we report a high-quality genomic assembly of Nannochloropsis gaditana, consisting of large contigs, up to...Nannochloropsis is rapidly emerging as a model organism for the study of biofuel production in microalgae.Here, we report a high-quality genomic assembly of Nannochloropsis gaditana, consisting of large contigs, up to 500 kbplong, and scaffolds that in most cases span the entire length of the chromosomes. We identified 10646 complete genesand characterized possible alternative transcripts. The annotation of the predicted genes and the analysis of cellular pro-cesses revealed traits relevant for the genetic improvement of this organism such as genes involved in DNA recombina-tion, RNA silencing, and cell wall synthesis. We also analyzed the modification of the transcriptional profile in nitrogendeficiencyma condition known to stimulate lipid accumulation. While the content of lipids increased, we did not detectmajor changes in expression of the genes involved in their biosynthesis. At the same time, we observed a very signifi-cant down-regulation of mitochondrial gene expression, suggesting that part of the AcetyI-CoA and NAD(P)H, normallyoxidized through the mitochondrial respiration, would be made available for fatty acids synthesis, increasing the fluxthrough the lipid biosynthetic pathway. Finally, we released an information resource of the genomic data of IV. gaditana,available online at www.nannochloropsis.org.展开更多
De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carri...De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line,respectively.We then determined an efficient way to yield an optimal overall assembly using three different strategies.We first assembled brain and cell line transcriptome using a single k-mer length.Next we tested a range of values of k-mer length and coverage cutoff in assembling.Lastly,we combined the assembled contigs from a range of k values to generate a final assembly.By comparing these assembly results,we found that using only one k-mer value for assembly is not enough to generate good assembly results,but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.展开更多
The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de nov...The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de novo assembly;nevertheless,its performance for transcriptome assembly is unclear.In this study,we used both simulated and real RNA-Seq data,from either artificial RNA templates or human transcripts,to evaluate five de novo assemblers,ABySS,Mira,Trinity,Velvet and Oases.Of these assemblers,ABySS,Trinity,Velvet and Oases are all based on de Bruijn graph,and Mira uses an overlap graph algorithm.Various numbers of RNA short reads were selected from the External RNA Control Consortium(ERCC) data and human chromosome 22.A number of statistics were then calculated for the resulting contigs from each assembler.Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate.Trinity had relative good performance for both ERCC and human data,but it may not consistently generate full length transcripts.ABySS was the fastest method but its assembly quality was low.Mira gave a good rate for mapping its contigs onto human chromosome 22,but its computational speed is not satisfactory.Our results suggest that transcript assembly remains a challenge problem for bioinformatics society.Therefore,a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.展开更多
Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcrip...Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcriptome,it still presents special challenges.To handle these difficulties and reconstruct transcriptome as completely as possible,current computational approaches mainly employ two strategies:de novo assembly and genome-guided assembly.In order to find the similarities and differences between them,we firstly chose five representative assemblers belonging to the two classes respectively,and then investigated and compared their algorithm features in theory and real performances in practice.We found that all the methods can be reduced to graph reduction problems,yet they have different conceptual and practical implementations,thus each assembly method has its specific advantages and disadvantages,performing worse than others in certain aspects while outperforming others in anther aspects at the same time.Finally we merged assemblies of the five assemblers and obtained a much better assembly.Additionally we evaluated an assembler using genome-guided de novo assembly approach,and achieved good performance.Based on these results,we suggest that to obtain a comprehensive set of recovered transcripts,it is better to use a combination of de novo assembly and genome-guided assembly.展开更多
Microtus fortis is the only mammalian host that exhibits intrinsic resistance against Schistosoma japonicum infection.However,the underlying molecular mechanisms of this resistance are not yet known.Here,we perform th...Microtus fortis is the only mammalian host that exhibits intrinsic resistance against Schistosoma japonicum infection.However,the underlying molecular mechanisms of this resistance are not yet known.Here,we perform the first de novo genome assembly of M.fortis,comprehensive gene annotation analysis,and evolution analysis.Furthermore,we compare the recovery rate of schistosomes,pathological changes,and liver transcriptomes between M.fortis and mice at different time points after infection.We observe that the time and type of immune response in M.fortis are different from those in mice.M.fortis activates immune and inflammatory responses on the 10th day post infection,such as leukocyte extravasation,antibody activation,Fc-gamma receptor-mediated phagocytosis,and the interferon signaling cascade,which play important roles in preventing the development of schistosomes.In contrast,an intense immune response occurrs in mice at the late stages of infection and could not eliminate schistosomes.Infected mice suffer severe pathological injury and continuous decreases in cell cycle,lipid metabolism,and other functions.Our findings offer new insights into the intrinsic resistance mechanism of M.fortis against schistosome infection.The genome sequence also provides the basis for future studies of other important traits in M.fortis.展开更多
A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced t...A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript as- sembly contigs (TACs) with an N50 of 1510 bp, the largest one being -8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping posi- tions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.展开更多
Reconstruction of transcriptome by de novo assembly from next generation sequencing (NGS) short-sequence reads provides an essential mean to catalog expressed genes, identify splicing isoforms, and capture the expre...Reconstruction of transcriptome by de novo assembly from next generation sequencing (NGS) short-sequence reads provides an essential mean to catalog expressed genes, identify splicing isoforms, and capture the expression detail of transcripts for organisms with no reference genome available. De novo transcriptome assembly faces many unique challenges, including alternative splicing, variable expression level covering a dynamic range of several orders of magnitude, artifacts introduced by reverse transcription, etc. In the current review, we illustrate the grand strategy in applying De Bruijn Graph (DBG) approach in de novo transcriptome assembly. We further analyze many parameters proven critical in transcriptome assembly using DBG. Among them, k-met length, coverage depth of reads, genome complexity, performance of different programs are addressed in greater details. A multi-k-mer strategy balancing efficiency and sensitivity is discussed and highly recommended for de novo transcriptome assembly. Future direction points to the combination of NGS and third generation sequencing technology that would greatly enhance the power of de novo transcriptomics study.展开更多
文摘Quinoa (Chenopodium quinoa Willd.) is a halophytic, allotetraploid grain crop of the Amaranthaceae family with impressive drought tolerance, nutritional content and an increasing worldwide market. Here we report the results of an RNA-seq transcriptome analysis of Chenopodium quinoa using four water treatments (field capacity to drought) on the varieties “Ingapirca” (representing valley ecotypes) and “Ollague” (representing Altiplano Salares ecotypes). Physiological results, including growth rate, photosynthetic rate, stomatal conductance, and stem water potential, support the earlier findings that the Altiplano Salares ecotypes display greater tolerance to drought-like stress conditions than the valley ecotypes. cDNA libraries from root tissue samples for each variety × treatment combination were sequenced using Illumina Hi-Seq technology in an RNA-seq experiment. De novo assembly of the transcriptome generated 20,337 unique transcripts. Gene expression analysis of the RNA-seq data identified 462 putative gene products that showed differential expression based on treatment, and 27 putative gene products differentially expressed based on variety × treatment, including significant expression differences in root tissue in response to increasing water stress. BLAST searches and gene ontology analysis show an overlap between drought tolerance stress and other abiotic stress mechanisms.
基金Supported by the National Natural Science Foundation of China(No.31670208)the Applied Basic Research Programs of Shanxi Province of China(No.201801D221242)+1 种基金the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi of China(No.2019L0041)the Shanxi“Project 1331”.
文摘Salt stress is an abiotic stress to plants in especially saline lakes.Dunaliella,a halophilic microalga distributed throughout salt lakes and seas,can respond to different salinity stresses by regulating the expression of some genes.However,these genes and their function and biological processes involved remain unclear.Profi ling these salt-stress-related genes in a high-salt-tolerant Dunaliella species will help clarify the salt tolerance machinery of Dunaliella.Three D.salina_YC salt-stress groups were tested under low(0.51 mol/L),moderate(1.03 mol/L),and high(3.42 mol/L)NaCl concentrations and one control group under very low(0.05 mol/L)NaCl concentration and 3 transcriptome results that were deep sequenced and de novo assembled were obtained per group.Twelve high-quality RNA-seq libraries with 46585 upregulated and 47805 downregulated unigenes were found.Relative to the control,188 common differentially expressed genes(DEGs)were screened and divided into four clusters in expression pattern.Fifteen of them annotated in the significant enriched Gene Ontology(GO)and Kyoto Encyclopedia of Genes and Genomes(KEGG)were validated via qPCR.Their qPCR-based relative expression patterns were similar to their RNA-seq-based patterns.Two significant DEGs,the geranylgeranyl diphosphate synthase coding gene(1876-bp cDNA)and diacylglycerol O-acyltransferase coding gene(2968-bp cDNA),were cloned and analyzed in silico.The total lipid content,superoxide dismutase specific activity,and betacarotene content of D.salina_YC increased gradually with increasing salinity.In addition,the expression of 11 validated genes involved in fatty acid biosynthesis/degradation,active oxygen or carotenoid metabolisms showed significant changes.In addition,algal photochemical efficiency was diminished with increasing salinity,as well as the expression of 4 photosynthesis-related genes.These results could help clarify the molecular mechanisms underlying D.salina responses to the Yuncheng Salt Lake environment and lay a foundation for further utilization of this algal resource.
基金supported by Major scientific and technological projects of XPCC(2020KWZ-012)。
文摘Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development periods were collected for transcriptome analysis.High performance liquid chromatography analysis showed that CGA contents of KFP at 88,118 and 163 days after full bloom were(20.96±1.84),(12.01±0.91)and(7.16±0.41)mg/100 g,respectively,and decreased with the fruit development.Pears from these typical 3 periods were selected for de novo transcriptome assemble and 68059 unigenes were assembled from 444037960 clean reads.One‘phenylpropanoid biosynthesis’pathway including 57 unigenes,11 PALs,1 PTAL,64CLs,9 C4Hs,25 HCTs and 5 C3’Hs related to CGA biosynthesis was determined.It was found that the expression levels of 11 differentially expressed genes including 1 PAL,2 C4Hs,34CLs and 5 HCTs were consistent with the change of CGA content.Quantitative polymerase chain reaction analysis further showed that 8 unigenes involved in CGA biosynthesis were consistent with the RNA-seq data.These findings will provide a comprehensive understanding and valuable information on the genetic engineering and molecular breeding in KFP.
基金the China Agriculture Research System(CARS-02-77)the Shaanxi Province Research and Development Project(2021LLRH-07)the Yangling Seed Industry Innovation Center Project(YLZY-YM-01).
文摘Superior inbred lines are central to maize breeding as sources of natural variation.Although many elite lines have been sequenced,less sequencing attention has been paid to newly developed lines.We constructed a genome assembly of the elite inbred line KA105,which has recently been developed by an arti-ficial breeding population named Shaan A and has shown desirable characteristics for breeding.Its pedigree showed genetic divergence from B73 and other lines in its pedigree.Comparison with the B73 reference genome revealed extensive structural variation,58 presence/absence variation(PAV)genes,and 1023 expanded gene families,some of which may be associated with disease resistance.A network-based integrative analysis of stress-induced transcriptomes identified 13 KA105-specific PAV genes,of which eight were induced by at least one kind of stress,participating in gene modules responding to stress such as drought and southern leaf blight disease.More than 200,000 gene pairs were differentially correlated between KA105 and B73 during kernel development.The KA105 reference genome and transcriptome atlas are a resource for further germplasm improvement and surveys of maize genomic variation and gene function.
文摘Special xylem tissue called “compression wood” is formed on the lower side of inclined stems when gymnosperms grow on a slope. We investigated the molecular mechanism of compression wood formation. Transcriptome analysis by next-generation sequencing (NGS) was applied to the xylem of Chamaecyparis obtusa to develop a catalog of general gene expression in differentiating xylem during compression and normal wood formation. The sequencing output generated 234,924,605 reads and 40,602 contigs (mean size = 529 bp). Based on a sequence similarity search with known proteins, 54.2% (22,005) of the contigs showed homology with sequences in the databases. Of these annotated contigs, 19,293 contigs were assigned to Gene Ontology categories. Differential gene expression between the compression and normal wood libraries was analyzed by mapping the reads from each library to the assembled contigs. In total, 2875 contigs were identified as differentially expressed, including 1207 that were up-regulated and 1668 that were down-regulated in compression wood. We selected 30 genes and compared the transcript abundance between compression and normal wood by quantitative polymerase chain reaction analysis to validate the NGS results. We found that 27 of the 30 genes showed the same expression patterns as the original NGS results.
基金supported by the International Science and Technology Cooperation of China(2011DFA32730)
文摘Panax ginseng C. A. Meyer is an important traditional herb in eastern Asia. It contains ginsenosides, which are primary bioactive compounds with medicinal properties. Although ginseng has been cultivated since at least the Ming dynasty to increase production, cultivated ginseng has lower quantities of ginsenosides and lower disease resistance than ginseng grown under natural conditions. We extracted root RNA from six varieties of fifth-year P. ginseng cultivars representing four different growth conditions, and performed Illumina paired-end sequencing. In total, 163,165,706 raw reads were obtained and used to generate a de novo transcriptome that consisted of 151,763 contigs(76,336 unigenes), of which 100,648 contigs(66.3%) were successfully annotated. Differential expression analysis revealed that most differentially expressed genes(DEGs) were upregulated(246 out of 258, 95.3%) in ginseng grown under natural conditions compared with that grown under artificial conditions. These DEGs were enriched in gene ontology(GO) terms including response to stimuli and localization. In particular, some key ginsenoside biosynthesis-related genes, including HMG-Co A synthase(HMGS), mevalonate kinase(MVK), and squalene epoxidase(SE), were upregulated in wild-grown ginseng. Moreover, a high proportion of disease resistance-related genes were upregulated in wild-grown ginseng. This study is the first transcriptome analysis to compare wild-grown and cultivated ginseng, and identifies genes that may produce higher ginsenoside content and better disease resistance in the wild; these genes may have the potential to improve cultivated ginseng grown in artificial environments.
文摘Nannochloropsis is rapidly emerging as a model organism for the study of biofuel production in microalgae.Here, we report a high-quality genomic assembly of Nannochloropsis gaditana, consisting of large contigs, up to 500 kbplong, and scaffolds that in most cases span the entire length of the chromosomes. We identified 10646 complete genesand characterized possible alternative transcripts. The annotation of the predicted genes and the analysis of cellular pro-cesses revealed traits relevant for the genetic improvement of this organism such as genes involved in DNA recombina-tion, RNA silencing, and cell wall synthesis. We also analyzed the modification of the transcriptional profile in nitrogendeficiencyma condition known to stimulate lipid accumulation. While the content of lipids increased, we did not detectmajor changes in expression of the genes involved in their biosynthesis. At the same time, we observed a very signifi-cant down-regulation of mitochondrial gene expression, suggesting that part of the AcetyI-CoA and NAD(P)H, normallyoxidized through the mitochondrial respiration, would be made available for fatty acids synthesis, increasing the fluxthrough the lipid biosynthetic pathway. Finally, we released an information resource of the genomic data of IV. gaditana,available online at www.nannochloropsis.org.
基金supported by the National Basic Research Program of China (Grant Nos. 2010CB945401, 2007CB108800)National Natural Science Foundation of China (Grant Nos. 30870575, 31071162,31000590)the Science and Technology Commission of Shanghai Municipality (Grant No. 11DZ2260300)
文摘De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line,respectively.We then determined an efficient way to yield an optimal overall assembly using three different strategies.We first assembled brain and cell line transcriptome using a single k-mer length.Next we tested a range of values of k-mer length and coverage cutoff in assembling.Lastly,we combined the assembled contigs from a range of k values to generate a final assembly.By comparing these assembly results,we found that using only one k-mer value for assembly is not enough to generate good assembly results,but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.
基金supported by grants from the National Center for Research Resources (5P20RR016471-12)the National Institute of General Medical Sciences (8 P20 GM103442-12) from the National Institutes of Healththe seed collaborative research grant from the Odegard School of Aerospace Sciences and the School of Medicine and Health Sciences at University of North Dakota
文摘The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de novo assembly;nevertheless,its performance for transcriptome assembly is unclear.In this study,we used both simulated and real RNA-Seq data,from either artificial RNA templates or human transcripts,to evaluate five de novo assemblers,ABySS,Mira,Trinity,Velvet and Oases.Of these assemblers,ABySS,Trinity,Velvet and Oases are all based on de Bruijn graph,and Mira uses an overlap graph algorithm.Various numbers of RNA short reads were selected from the External RNA Control Consortium(ERCC) data and human chromosome 22.A number of statistics were then calculated for the resulting contigs from each assembler.Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate.Trinity had relative good performance for both ERCC and human data,but it may not consistently generate full length transcripts.ABySS was the fastest method but its assembly quality was low.Mira gave a good rate for mapping its contigs onto human chromosome 22,but its computational speed is not satisfactory.Our results suggest that transcript assembly remains a challenge problem for bioinformatics society.Therefore,a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.
基金supported by the National Basic Research Program of China (2010CB945401)the National Natural Science Foundation of China (31240038, 31171264, 31071162, 31000590)the Science and Technology Commission of Shanghai Municipality (11DZ2260300)
文摘Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcriptome,it still presents special challenges.To handle these difficulties and reconstruct transcriptome as completely as possible,current computational approaches mainly employ two strategies:de novo assembly and genome-guided assembly.In order to find the similarities and differences between them,we firstly chose five representative assemblers belonging to the two classes respectively,and then investigated and compared their algorithm features in theory and real performances in practice.We found that all the methods can be reduced to graph reduction problems,yet they have different conceptual and practical implementations,thus each assembly method has its specific advantages and disadvantages,performing worse than others in certain aspects while outperforming others in anther aspects at the same time.Finally we merged assemblies of the five assemblers and obtained a much better assembly.Additionally we evaluated an assembler using genome-guided de novo assembly approach,and achieved good performance.Based on these results,we suggest that to obtain a comprehensive set of recovered transcripts,it is better to use a combination of de novo assembly and genome-guided assembly.
基金This work was supported by the Key Project in the National Science&Technology Pillar Program from the Ministry of Science and Technology(2015BAI09B04)the National Natural Science Foundation of China(31872256,31472188)+2 种基金the National Key Research and Development Program of China(2017YFD0501306)the Science and Technology Service Network Initiative of Chinese Academy of Sciences(KFJ-STS-QYZD-126,ZDBS-SSW-DQC-02)CAS Youth Innovation Promotion Association,and SA-SIBS Scholarship Program.
文摘Microtus fortis is the only mammalian host that exhibits intrinsic resistance against Schistosoma japonicum infection.However,the underlying molecular mechanisms of this resistance are not yet known.Here,we perform the first de novo genome assembly of M.fortis,comprehensive gene annotation analysis,and evolution analysis.Furthermore,we compare the recovery rate of schistosomes,pathological changes,and liver transcriptomes between M.fortis and mice at different time points after infection.We observe that the time and type of immune response in M.fortis are different from those in mice.M.fortis activates immune and inflammatory responses on the 10th day post infection,such as leukocyte extravasation,antibody activation,Fc-gamma receptor-mediated phagocytosis,and the interferon signaling cascade,which play important roles in preventing the development of schistosomes.In contrast,an intense immune response occurrs in mice at the late stages of infection and could not eliminate schistosomes.Infected mice suffer severe pathological injury and continuous decreases in cell cycle,lipid metabolism,and other functions.Our findings offer new insights into the intrinsic resistance mechanism of M.fortis against schistosome infection.The genome sequence also provides the basis for future studies of other important traits in M.fortis.
文摘A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript as- sembly contigs (TACs) with an N50 of 1510 bp, the largest one being -8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping posi- tions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.
基金ACKNOWLEDGEMENTS This work is supported in part by grants from the National Basic Research Program of China (Nos. 2012CB316501, 2012CB517905 and 2013CB 127000) and the National Natural Science Foundation of China (Nos. 31571310 and 31271409).
文摘Reconstruction of transcriptome by de novo assembly from next generation sequencing (NGS) short-sequence reads provides an essential mean to catalog expressed genes, identify splicing isoforms, and capture the expression detail of transcripts for organisms with no reference genome available. De novo transcriptome assembly faces many unique challenges, including alternative splicing, variable expression level covering a dynamic range of several orders of magnitude, artifacts introduced by reverse transcription, etc. In the current review, we illustrate the grand strategy in applying De Bruijn Graph (DBG) approach in de novo transcriptome assembly. We further analyze many parameters proven critical in transcriptome assembly using DBG. Among them, k-met length, coverage depth of reads, genome complexity, performance of different programs are addressed in greater details. A multi-k-mer strategy balancing efficiency and sensitivity is discussed and highly recommended for de novo transcriptome assembly. Future direction points to the combination of NGS and third generation sequencing technology that would greatly enhance the power of de novo transcriptomics study.