The white poplar(Populus alba) is widely distributed in Central Asia and Europe. There are natural populations of white poplar in Irtysh River basin in China. It also can be cultivated and grown well in northern China...The white poplar(Populus alba) is widely distributed in Central Asia and Europe. There are natural populations of white poplar in Irtysh River basin in China. It also can be cultivated and grown well in northern China. In this study, we sequenced the genome of P. alba by single-molecule real-time technology. De novo assembly of P. alba had a genome size of 415.99 Mb with a contig N50 of 1.18 Mb. A total of 32,963 protein-coding genes were identified. 45.16% of the genome was annotated as repetitive elements. Genome evolution analysis revealed that divergence between P. alba and Populus trichocarpa(black cottonwood)occurred ~5.0 Mya(3.0, 7.1). Fourfold synonymous third-codon transversion(4 DTV) and synonymous substitution rate(ks)distributions supported the occurrence of the salicoid WGD event(~ 65 Mya). Twelve natural populations of P. alba in the Irtysh River basin in China were sequenced to explore the genetic diversity. Average pooled heterozygosity value of P. alba populations was 0.170±0.014, which was lower than that in Italy(0.271±0.051) and Hungary(0.264±0.054). Tajima's D values showed a negative distribution, which might signify an excess of low frequency polymorphisms and a bottleneck with later expansion of P.alba populations examined.展开更多
Panax ginseng C. A. Meyer is an important traditional herb in eastern Asia. It contains ginsenosides, which are primary bioactive compounds with medicinal properties. Although ginseng has been cultivated since at leas...Panax ginseng C. A. Meyer is an important traditional herb in eastern Asia. It contains ginsenosides, which are primary bioactive compounds with medicinal properties. Although ginseng has been cultivated since at least the Ming dynasty to increase production, cultivated ginseng has lower quantities of ginsenosides and lower disease resistance than ginseng grown under natural conditions. We extracted root RNA from six varieties of fifth-year P. ginseng cultivars representing four different growth conditions, and performed Illumina paired-end sequencing. In total, 163,165,706 raw reads were obtained and used to generate a de novo transcriptome that consisted of 151,763 contigs(76,336 unigenes), of which 100,648 contigs(66.3%) were successfully annotated. Differential expression analysis revealed that most differentially expressed genes(DEGs) were upregulated(246 out of 258, 95.3%) in ginseng grown under natural conditions compared with that grown under artificial conditions. These DEGs were enriched in gene ontology(GO) terms including response to stimuli and localization. In particular, some key ginsenoside biosynthesis-related genes, including HMG-Co A synthase(HMGS), mevalonate kinase(MVK), and squalene epoxidase(SE), were upregulated in wild-grown ginseng. Moreover, a high proportion of disease resistance-related genes were upregulated in wild-grown ginseng. This study is the first transcriptome analysis to compare wild-grown and cultivated ginseng, and identifies genes that may produce higher ginsenoside content and better disease resistance in the wild; these genes may have the potential to improve cultivated ginseng grown in artificial environments.展开更多
Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcrip...Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcriptome,it still presents special challenges.To handle these difficulties and reconstruct transcriptome as completely as possible,current computational approaches mainly employ two strategies:de novo assembly and genome-guided assembly.In order to find the similarities and differences between them,we firstly chose five representative assemblers belonging to the two classes respectively,and then investigated and compared their algorithm features in theory and real performances in practice.We found that all the methods can be reduced to graph reduction problems,yet they have different conceptual and practical implementations,thus each assembly method has its specific advantages and disadvantages,performing worse than others in certain aspects while outperforming others in anther aspects at the same time.Finally we merged assemblies of the five assemblers and obtained a much better assembly.Additionally we evaluated an assembler using genome-guided de novo assembly approach,and achieved good performance.Based on these results,we suggest that to obtain a comprehensive set of recovered transcripts,it is better to use a combination of de novo assembly and genome-guided assembly.展开更多
Hapalogenys analis(order Lobotiformes)is an economically and ecologically significant fish species.It is a typical sedentary rocky reef fish and is primarily found in the northern Pacific Ocean.Here,we used Hi-C and P...Hapalogenys analis(order Lobotiformes)is an economically and ecologically significant fish species.It is a typical sedentary rocky reef fish and is primarily found in the northern Pacific Ocean.Here,we used Hi-C and PacBio sequencing technique to assemble a high-quality,chromosome-level genome for this species.The 539 Mb genome had a contig N50 with a size of 3.43 Mb,while 755 contigs clustered into 24 chromosomal groups with an anchoring rate of 99.02%.Of the total genomic sequence,132.74Mb(24.39%)were annotated as repeat elements.A total of 21360 protein-coding genes were identified,of which 20787 genes(97.32%)were successfully annotated to public databases.The BUSCO evaluation indicated that 96.90%of the total orthologous genes were matched.The phylogenetic tree representing H.analis and 14 other bony fish species indicated that the H.analis genome contained 364 expanded gene families related to olfactory receptor activity,compared with the common ancestor of H.analis and Sciaenidae.Comparative genomic analysis further identified 3584 contracted gene families.Branch-site modeling identified 277 genes experiencing positive selection,which may facilitate the adaptation to rocky reef environments.The genome reported here is helpful for ecological and evolutionary studies of H.analis.展开更多
Background:The single-molecular sequencing(SMS)is under rapid development and generating increasingly long and accurate sequences.De novo assembly of genomes from SMS sequences is a critical step for many genomic stud...Background:The single-molecular sequencing(SMS)is under rapid development and generating increasingly long and accurate sequences.De novo assembly of genomes from SMS sequences is a critical step for many genomic studies.To scale well with the developing trends of SMS,many de novo assemblers for SMS have been released.These assembly workflows can be categorized into two different kinds:the correction-and-assembly strategy and the assembly-and-correction strategy,both of which are gaining more and more attentions.Results:In this article we make a discussion on the characteristics of errors in SMS sequences・We then review the currently widely applied de novo assemblers for SMS sequences.We also describe computational methods relevant to de novo assembly,including the alignment methods and the error correction methods.Benchmarks are provided to analyze their performance on different datasets and to provide use guides on applying the computation methods.Conclusion:We make a detailed review on the latest development of de novo assembly and some relevant algorithms for SMS,including their rationales,solutions and results.Besides,we provide use guides on the algorithms based on their benchmark results.Finally we conclude the review by giving some developing trends of third generation sequencing(TGS).展开更多
Reconstruction of transcriptome by de novo assembly from next generation sequencing (NGS) short-sequence reads provides an essential mean to catalog expressed genes, identify splicing isoforms, and capture the expre...Reconstruction of transcriptome by de novo assembly from next generation sequencing (NGS) short-sequence reads provides an essential mean to catalog expressed genes, identify splicing isoforms, and capture the expression detail of transcripts for organisms with no reference genome available. De novo transcriptome assembly faces many unique challenges, including alternative splicing, variable expression level covering a dynamic range of several orders of magnitude, artifacts introduced by reverse transcription, etc. In the current review, we illustrate the grand strategy in applying De Bruijn Graph (DBG) approach in de novo transcriptome assembly. We further analyze many parameters proven critical in transcriptome assembly using DBG. Among them, k-met length, coverage depth of reads, genome complexity, performance of different programs are addressed in greater details. A multi-k-mer strategy balancing efficiency and sensitivity is discussed and highly recommended for de novo transcriptome assembly. Future direction points to the combination of NGS and third generation sequencing technology that would greatly enhance the power of de novo transcriptomics study.展开更多
Salt stress is an abiotic stress to plants in especially saline lakes.Dunaliella,a halophilic microalga distributed throughout salt lakes and seas,can respond to different salinity stresses by regulating the expressio...Salt stress is an abiotic stress to plants in especially saline lakes.Dunaliella,a halophilic microalga distributed throughout salt lakes and seas,can respond to different salinity stresses by regulating the expression of some genes.However,these genes and their function and biological processes involved remain unclear.Profi ling these salt-stress-related genes in a high-salt-tolerant Dunaliella species will help clarify the salt tolerance machinery of Dunaliella.Three D.salina_YC salt-stress groups were tested under low(0.51 mol/L),moderate(1.03 mol/L),and high(3.42 mol/L)NaCl concentrations and one control group under very low(0.05 mol/L)NaCl concentration and 3 transcriptome results that were deep sequenced and de novo assembled were obtained per group.Twelve high-quality RNA-seq libraries with 46585 upregulated and 47805 downregulated unigenes were found.Relative to the control,188 common differentially expressed genes(DEGs)were screened and divided into four clusters in expression pattern.Fifteen of them annotated in the significant enriched Gene Ontology(GO)and Kyoto Encyclopedia of Genes and Genomes(KEGG)were validated via qPCR.Their qPCR-based relative expression patterns were similar to their RNA-seq-based patterns.Two significant DEGs,the geranylgeranyl diphosphate synthase coding gene(1876-bp cDNA)and diacylglycerol O-acyltransferase coding gene(2968-bp cDNA),were cloned and analyzed in silico.The total lipid content,superoxide dismutase specific activity,and betacarotene content of D.salina_YC increased gradually with increasing salinity.In addition,the expression of 11 validated genes involved in fatty acid biosynthesis/degradation,active oxygen or carotenoid metabolisms showed significant changes.In addition,algal photochemical efficiency was diminished with increasing salinity,as well as the expression of 4 photosynthesis-related genes.These results could help clarify the molecular mechanisms underlying D.salina responses to the Yuncheng Salt Lake environment and lay a foundation for further utilization of this algal resource.展开更多
Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development period...Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development periods were collected for transcriptome analysis.High performance liquid chromatography analysis showed that CGA contents of KFP at 88,118 and 163 days after full bloom were(20.96±1.84),(12.01±0.91)and(7.16±0.41)mg/100 g,respectively,and decreased with the fruit development.Pears from these typical 3 periods were selected for de novo transcriptome assemble and 68059 unigenes were assembled from 444037960 clean reads.One‘phenylpropanoid biosynthesis’pathway including 57 unigenes,11 PALs,1 PTAL,64CLs,9 C4Hs,25 HCTs and 5 C3’Hs related to CGA biosynthesis was determined.It was found that the expression levels of 11 differentially expressed genes including 1 PAL,2 C4Hs,34CLs and 5 HCTs were consistent with the change of CGA content.Quantitative polymerase chain reaction analysis further showed that 8 unigenes involved in CGA biosynthesis were consistent with the RNA-seq data.These findings will provide a comprehensive understanding and valuable information on the genetic engineering and molecular breeding in KFP.展开更多
De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carri...De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line,respectively.We then determined an efficient way to yield an optimal overall assembly using three different strategies.We first assembled brain and cell line transcriptome using a single k-mer length.Next we tested a range of values of k-mer length and coverage cutoff in assembling.Lastly,we combined the assembled contigs from a range of k values to generate a final assembly.By comparing these assembly results,we found that using only one k-mer value for assembly is not enough to generate good assembly results,but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.展开更多
The wild rice species in the genus Oryza harbor a large amount of genetic diversity that has been untapped for rice improvement.Pan-genomics has revolutionized genomic research in plants.However,rice pan-genomic studi...The wild rice species in the genus Oryza harbor a large amount of genetic diversity that has been untapped for rice improvement.Pan-genomics has revolutionized genomic research in plants.However,rice pan-genomic studies so far have been limited mostly to cultivated accessions,with only a few close wild relatives.Advances in sequencing technologies have permitted the assembly of highquality rice genome sequences at low cost,making it possible to construct genus-level pan-genomes across all species.In this review,we summarize progress in current research on genetic and genomic resources in Oryza,and in sequencing and computational technologies used for rice genome and pangenome construction.For future work,we discuss the approaches and challenges in the construction of,and data access to,Oryza pan-genomes based on representative high-quality genome assemblies.The Oryza pan-genomes will provide a basis for the exploration and use of the extensive genetic diversity present in both cultivated and wild rice populations.展开更多
Soil salinization is a serious ecological problem worldwide and information regarding the salt tolerance mechanisms of Salix is scarce.To elucidate the dynamic changes in the molecular mechanisms of Salix under salt s...Soil salinization is a serious ecological problem worldwide and information regarding the salt tolerance mechanisms of Salix is scarce.To elucidate the dynamic changes in the molecular mechanisms of Salix under salt stress,we generated gene expression profiles and examined changes in the expression of those genes.RNA-Seq was used to produce six cDNA libraries constructed from the leaves of Salix ×jiangsuensis CL‘J2345’treated with NaCl for 0,2,6,12,24 and 48 h.In total,249 million clean reads were assembled into 12,739 unigenes,all of which were clustered into 10 profiles based on their temporal expression patterns.KEGG analysis revealed that as an early defense response,the biosynthesis pathways of cutin,suberin and wax,which are involved in cell wall structure,were activated beginning at 2 h.The expression of secondary metabolism genes,including those involved in the phenylpropanoid,flavonoid,stilbenoid,diarylheptanoid and gingerol pathways,peaked at 6 h and 24 h;the upregulated genes were mainly involved in plant hormone pathways and beta-alanine,galactose and betalain metabolism.We identified roles of key phytohormones and found ETH to be the major signaling molecule activating TFs at 12 h;ETH,ABA,IAA and SA were the key molecules at 24 h.Moreover,we found that the upregulated genes were associated with elevated levels of amino acids,sucrose,inositol,stress proteins and ROS-scavenging enzymes,contributing to the maintenance of water balance.This research constitutes the first detailed analysis of salt stress-related mechanisms in Salix and identifies potential targets for genetic manipulation to improve yields.展开更多
The white-blotched river stingray(Potamotrygon leopoldi)is a cartilaginous fish native to the Xingu River,a tributary of the Amazon River system.As a rare freshwater-dwelling cartilaginous fish in the Potamotrygonidae...The white-blotched river stingray(Potamotrygon leopoldi)is a cartilaginous fish native to the Xingu River,a tributary of the Amazon River system.As a rare freshwater-dwelling cartilaginous fish in the Potamotrygonidae family in which no member has the genome sequencing information available,P.leopoldi provides the evolutionary details in fish phylogeny,niche adaptation,and skeleton formation.In this study,we present its draft genome of 4.11 Gb comprising 16,227 contigs and 13,238 scaffolds,with contig N50 of 3937 kb and scaffold N50 of 5675 kb in size.Our analysis shows that P.leopoldi is a slow-evolving fish that diverged from elephant sharks about 96 million years ago.Moreover,two gene families related to the immune system(immunoglobulin heavy constant delta genes and T-cell receptor alpha/delta variable genes)exhibit expansion in P.leopoldi only.We also identified the Hox gene clusters in P.leopoldi and discovered that seven Hox genes shared by five representative fish species are missing in P.leopoldi.The RNA sequencing data from P.leopoldi and other three fish species demonstrate that fishes have a more diversified tissue expression spectrum when compared to mammals.Our functional studies suggest that lack of the gc gene encoding vitamin D-binding protein in cartilaginous fishes(both P.leopoldi and Callorhinchus milii)could partly explain the absence of hard bone in their endoskeleton.Overall,this genome resource provides new insights into the niche adaptation,body plan,and skeleton formation of P.leopoldi,as well as the genome evolution in cartilaginous fishes.展开更多
The revolution of genome sequencing is continuing after the successful secondgeneration sequencing (SGS) technology. The third-generation sequencing (TGS) technology, led by Pacific Biosciences (PacBio), is prog...The revolution of genome sequencing is continuing after the successful secondgeneration sequencing (SGS) technology. The third-generation sequencing (TGS) technology, led by Pacific Biosciences (PacBio), is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT). MiniON identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MiniON has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assem- bly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.展开更多
Mulberry(Morns spp.)is the sole plant consumed by the domesticated silkworm.However,the genome of domesticated mulberry has not yet been sequenced,and the ploidy level of this species remains unclear.Here,we report a ...Mulberry(Morns spp.)is the sole plant consumed by the domesticated silkworm.However,the genome of domesticated mulberry has not yet been sequenced,and the ploidy level of this species remains unclear.Here,we report a high-quality,chromosome-level domesticated mulberry(Morus alba)genome.Analysis of genomic data and karyotype analyses confirmed that M.alba is a diploid with 28 chromosomes(2/7=2x=28).Population genomic analysis based on resequencing of 134 mulberry accessions classified domesticated mulberry into three geographical groups,namely,Taihu Basin of southeastern China(Hu mulberry),northern and southwestern China,and Japan.Hu mulberry had the lowest nucleotide diversity among these accessions and demonstrated obvious signatures of selection associated with environmental adaptation.Further phylogenetic analysis supports a previous proposal that multiple domesticated mulberry accessions previously classified as different species actually belong to one species.This study expands our understanding of genome evolution of the genus Morus and population structure of domesticated mulberry,which would facilitate mulberry breeding and improvement.展开更多
Mung bean is an economically important legume crop species that is used as a food,consumed as a vegetable,and used as an ingredient and even as a medicine.To explore the genomic diversity of mung bean,we assembled a h...Mung bean is an economically important legume crop species that is used as a food,consumed as a vegetable,and used as an ingredient and even as a medicine.To explore the genomic diversity of mung bean,we assembled a high-quality reference genome(Vrad_JL7)that was479.35 Mb in size,with a contig N50 length of 10.34 Mb.A total of 40,125 protein-coding genes were annotated,representing96.9%of the genetic region.We also sequenced 217 accessions,mainly landraces and cultivars from China,and identified 2,229,343 high-quality single-nucleotide polymorphisms(SNPs).Population structure revealed that the Chinese accessions diverged into two groups and were distinct from non-Chinese lines.Genetic diversity analysis based on genomic data from 750 accessions in 23 countries supported the hypothesis that mung bean was first domesticated in south Asia and introduced to east Asia probably through the Silk Road.We constructed the first pan-genome of mung bean germplasm and assembled 287.73 Mb of non-reference sequences.Among the genes,83.1%were core genes and 16.9%were variable.Presence/absence variation(PAV)events of nine genes involved in the regulation of the photoperiodic flowering pathway were identified as being under selection during the adaptation process to promote early flowering in the spring.Genomewide association studies(GWASs)revealed 2,912 SNPs and 259 gene PAV events associated with 33 agronomic traits,including a SNP in the coding region of the SWEET10 homolog(jg24043)involved in crude starch content and a PAV event in a large fragment containing 11 genes for color-related traits.This high-quality reference genome and pan-genome will provide insights into mung bean breeding.展开更多
Tartary buckwheat (Fagopyrum tataricum) is an important pseudocereal crop that is strongly adapted to growth in adverse environments. Its gluten-free grain contains complete proteins with a well-balanced composition...Tartary buckwheat (Fagopyrum tataricum) is an important pseudocereal crop that is strongly adapted to growth in adverse environments. Its gluten-free grain contains complete proteins with a well-balanced composition of essential amino acids and is a rich source of beneficial phytochemicals that provide significant health benefits. Here, we report a high-quality, chromosome-scale Tartary buckwheat genome sequence of- 489.3 Mb that is assembled by combining whole-genome shotgun sequencing of both Illumina short reads and single-molecule real-time long reads, sequence tags of a large DNA insert fosmid library, Hi-C sequencing data, and BioNano genome maps. We annotated 33 366 high-confidence protein-coding genes based on expression evidence. Comparisons of the intra-genome with the sugar beet genome revealed an independent whole-genome duplication that occurred in the buckwheat lineage after they diverged from the common ancestor, which was not shared with rosids or asterids. The reference genome facilitated the identification of many new genes predicted to be involved in rutin biosynthesis and regulation, aluminum stress resistance, and in drought and cold stress responses. Our data suggest that Tartary buckwheat's ability to tolerate high levels of abiotic stress is attributed to the expansion of several gene families involved in signal transduction, gene regulation, and membrane transport. The availability of these genomic resources will facilitate the discovery of agronomically and nutritionally important genes and genetic improvement of Tartary buckwheat.展开更多
High-quality rice reference genomes have accelerated the comprehensive identification of genome-wide variations and research on functional genomics and breeding.Tian-you-hua-zhan has been a leading hybrid in China ove...High-quality rice reference genomes have accelerated the comprehensive identification of genome-wide variations and research on functional genomics and breeding.Tian-you-hua-zhan has been a leading hybrid in China over the past decade.Here,de novo genome assembly strategy optimization for the rice indica lines Huazhan(HZ)and Tianfeng(TF),including sequencing platforms,assembly pipelines and sequence depth,was carried out.The PacBio and Nanopore platforms for long-read se-quencing were utilized,with the Canu,wtdbg2,SMARTdenovo,Flye,Canu-wtdbg2,Canu-SMARTdenovo and Canu-Flye assemblers.The combination of PacBio and Canu was optimal,considering the contig N50 length,contig number,assembled genome size and polishing process.The assembled contigs were scaffolded with Hi-C data,resulting in two“golden quality”rice reference genomes,and evaluated using the scaffold N50,BUSCO,and LTR assembly index.Furthermore,42,625 and 41,815 non-transposable element genes were annotated for HZ and TF,respectively.Based on our assembly of HZ and TF,as well as Zhenshan97,Minghui63,Shuhui498 and 9311,comprehensive variations were identified using Nipponbare as a reference.The de novo assembly strategy for rice we optimized and the“golden quality”rice genomes we produced for HZ and TF will benefit rice genomics and breeding research,especially with respect to uncovering the genomic basis of the elite traits of HZ and TF.展开更多
Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, tran...Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with dis- eases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Addition- ally, PacBio's sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.展开更多
Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to ob...Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to obtain accurate and comprehensive results. Here we reviewed the strategies for improving diverse transcriptomic studies and the annotation of genetic variants based on RNA-seq data. Mapping RNA-seq reads to the genome and transcriptome represent two distinct methods for quantifying the expression of genes/transcripts. Besides the known genes annotated in current databases, many novel genes/transcripts(especially those long noncoding RNAs) still can be identified on the reference genome using RNA-seq. Moreover, owing to the incompleteness of current reference genomes, some novel genes are missing from them. Genome-guided and de novo transcriptome reconstruction are two effective and complementary strategies for identifying those novel genes/transcripts on or beyond the reference genome. In addition, integrating the genes of distinct databases to conduct transcriptomics and genetics studies can improve the results of corresponding analyses.展开更多
The Y chromosome plays key roles in male fertility and reflects the evolutionary history of paternal lineages.Here,we present a de novo genome assembly of the Hu sheep with the first draft assembly of ovine Y chromoso...The Y chromosome plays key roles in male fertility and reflects the evolutionary history of paternal lineages.Here,we present a de novo genome assembly of the Hu sheep with the first draft assembly of ovine Y chromosome(o MSY),using nanopore sequencing and Hi-C technologies.The o MSY that we generated spans 10.6 Mb from which 775 Y-SNPs were identified by applying a large panel of whole genome sequences from worldwide sheep and wild Iranian mouflons.Three major paternal lineages(HY1a,HY1b and HY2)were defined across domestic sheep,of which HY2 was newly detected.Surprisingly,HY2 forms a monophyletic clade with the Iranian mouflons and is highly divergent from both HY1a and HY1b.Demographic analysis of Y chromosomes,mitochondrial and nuclear genomes confirmed that HY2 and the maternal counterpart of lineage C represented a distinct wild mouflon population in Iran that diverge from the direct ancestor of domestic sheep,the wild mouflons in Southeastern Anatolia.Our results suggest that wild Iranian mouflons had introgressed into domestic sheep and thereby introduced this Iranian mouflon specific lineage carrying HY2 to both East Asian and Africa sheep populations.展开更多
基金supported by the National Science Fund for Distinguished Young Scholars(31425006)Chinese Academy of Forestry(CAFYBB2018ZX001)
文摘The white poplar(Populus alba) is widely distributed in Central Asia and Europe. There are natural populations of white poplar in Irtysh River basin in China. It also can be cultivated and grown well in northern China. In this study, we sequenced the genome of P. alba by single-molecule real-time technology. De novo assembly of P. alba had a genome size of 415.99 Mb with a contig N50 of 1.18 Mb. A total of 32,963 protein-coding genes were identified. 45.16% of the genome was annotated as repetitive elements. Genome evolution analysis revealed that divergence between P. alba and Populus trichocarpa(black cottonwood)occurred ~5.0 Mya(3.0, 7.1). Fourfold synonymous third-codon transversion(4 DTV) and synonymous substitution rate(ks)distributions supported the occurrence of the salicoid WGD event(~ 65 Mya). Twelve natural populations of P. alba in the Irtysh River basin in China were sequenced to explore the genetic diversity. Average pooled heterozygosity value of P. alba populations was 0.170±0.014, which was lower than that in Italy(0.271±0.051) and Hungary(0.264±0.054). Tajima's D values showed a negative distribution, which might signify an excess of low frequency polymorphisms and a bottleneck with later expansion of P.alba populations examined.
基金supported by the International Science and Technology Cooperation of China(2011DFA32730)
文摘Panax ginseng C. A. Meyer is an important traditional herb in eastern Asia. It contains ginsenosides, which are primary bioactive compounds with medicinal properties. Although ginseng has been cultivated since at least the Ming dynasty to increase production, cultivated ginseng has lower quantities of ginsenosides and lower disease resistance than ginseng grown under natural conditions. We extracted root RNA from six varieties of fifth-year P. ginseng cultivars representing four different growth conditions, and performed Illumina paired-end sequencing. In total, 163,165,706 raw reads were obtained and used to generate a de novo transcriptome that consisted of 151,763 contigs(76,336 unigenes), of which 100,648 contigs(66.3%) were successfully annotated. Differential expression analysis revealed that most differentially expressed genes(DEGs) were upregulated(246 out of 258, 95.3%) in ginseng grown under natural conditions compared with that grown under artificial conditions. These DEGs were enriched in gene ontology(GO) terms including response to stimuli and localization. In particular, some key ginsenoside biosynthesis-related genes, including HMG-Co A synthase(HMGS), mevalonate kinase(MVK), and squalene epoxidase(SE), were upregulated in wild-grown ginseng. Moreover, a high proportion of disease resistance-related genes were upregulated in wild-grown ginseng. This study is the first transcriptome analysis to compare wild-grown and cultivated ginseng, and identifies genes that may produce higher ginsenoside content and better disease resistance in the wild; these genes may have the potential to improve cultivated ginseng grown in artificial environments.
基金supported by the National Basic Research Program of China (2010CB945401)the National Natural Science Foundation of China (31240038, 31171264, 31071162, 31000590)the Science and Technology Commission of Shanghai Municipality (11DZ2260300)
文摘Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcriptome,it still presents special challenges.To handle these difficulties and reconstruct transcriptome as completely as possible,current computational approaches mainly employ two strategies:de novo assembly and genome-guided assembly.In order to find the similarities and differences between them,we firstly chose five representative assemblers belonging to the two classes respectively,and then investigated and compared their algorithm features in theory and real performances in practice.We found that all the methods can be reduced to graph reduction problems,yet they have different conceptual and practical implementations,thus each assembly method has its specific advantages and disadvantages,performing worse than others in certain aspects while outperforming others in anther aspects at the same time.Finally we merged assemblies of the five assemblers and obtained a much better assembly.Additionally we evaluated an assembler using genome-guided de novo assembly approach,and achieved good performance.Based on these results,we suggest that to obtain a comprehensive set of recovered transcripts,it is better to use a combination of de novo assembly and genome-guided assembly.
基金supported by the Province Key Research and Development Program of Zhejiang (No.2021C02047)the Special Projects of Zhejiang Provincial Science and Technology Department (Nos.HYS-CZ-004,HYS-CZ-202208)the‘San Nong Jiu Fang’Science and Technology Cooperation Project of Zhejiang Province (No.2022 SN JF073)。
文摘Hapalogenys analis(order Lobotiformes)is an economically and ecologically significant fish species.It is a typical sedentary rocky reef fish and is primarily found in the northern Pacific Ocean.Here,we used Hi-C and PacBio sequencing technique to assemble a high-quality,chromosome-level genome for this species.The 539 Mb genome had a contig N50 with a size of 3.43 Mb,while 755 contigs clustered into 24 chromosomal groups with an anchoring rate of 99.02%.Of the total genomic sequence,132.74Mb(24.39%)were annotated as repeat elements.A total of 21360 protein-coding genes were identified,of which 20787 genes(97.32%)were successfully annotated to public databases.The BUSCO evaluation indicated that 96.90%of the total orthologous genes were matched.The phylogenetic tree representing H.analis and 14 other bony fish species indicated that the H.analis genome contained 364 expanded gene families related to olfactory receptor activity,compared with the common ancestor of H.analis and Sciaenidae.Comparative genomic analysis further identified 3584 contracted gene families.Branch-site modeling identified 277 genes experiencing positive selection,which may facilitate the adaptation to rocky reef environments.The genome reported here is helpful for ecological and evolutionary studies of H.analis.
文摘Background:The single-molecular sequencing(SMS)is under rapid development and generating increasingly long and accurate sequences.De novo assembly of genomes from SMS sequences is a critical step for many genomic studies.To scale well with the developing trends of SMS,many de novo assemblers for SMS have been released.These assembly workflows can be categorized into two different kinds:the correction-and-assembly strategy and the assembly-and-correction strategy,both of which are gaining more and more attentions.Results:In this article we make a discussion on the characteristics of errors in SMS sequences・We then review the currently widely applied de novo assemblers for SMS sequences.We also describe computational methods relevant to de novo assembly,including the alignment methods and the error correction methods.Benchmarks are provided to analyze their performance on different datasets and to provide use guides on applying the computation methods.Conclusion:We make a detailed review on the latest development of de novo assembly and some relevant algorithms for SMS,including their rationales,solutions and results.Besides,we provide use guides on the algorithms based on their benchmark results.Finally we conclude the review by giving some developing trends of third generation sequencing(TGS).
基金ACKNOWLEDGEMENTS This work is supported in part by grants from the National Basic Research Program of China (Nos. 2012CB316501, 2012CB517905 and 2013CB 127000) and the National Natural Science Foundation of China (Nos. 31571310 and 31271409).
文摘Reconstruction of transcriptome by de novo assembly from next generation sequencing (NGS) short-sequence reads provides an essential mean to catalog expressed genes, identify splicing isoforms, and capture the expression detail of transcripts for organisms with no reference genome available. De novo transcriptome assembly faces many unique challenges, including alternative splicing, variable expression level covering a dynamic range of several orders of magnitude, artifacts introduced by reverse transcription, etc. In the current review, we illustrate the grand strategy in applying De Bruijn Graph (DBG) approach in de novo transcriptome assembly. We further analyze many parameters proven critical in transcriptome assembly using DBG. Among them, k-met length, coverage depth of reads, genome complexity, performance of different programs are addressed in greater details. A multi-k-mer strategy balancing efficiency and sensitivity is discussed and highly recommended for de novo transcriptome assembly. Future direction points to the combination of NGS and third generation sequencing technology that would greatly enhance the power of de novo transcriptomics study.
基金Supported by the National Natural Science Foundation of China(No.31670208)the Applied Basic Research Programs of Shanxi Province of China(No.201801D221242)+1 种基金the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi of China(No.2019L0041)the Shanxi“Project 1331”.
文摘Salt stress is an abiotic stress to plants in especially saline lakes.Dunaliella,a halophilic microalga distributed throughout salt lakes and seas,can respond to different salinity stresses by regulating the expression of some genes.However,these genes and their function and biological processes involved remain unclear.Profi ling these salt-stress-related genes in a high-salt-tolerant Dunaliella species will help clarify the salt tolerance machinery of Dunaliella.Three D.salina_YC salt-stress groups were tested under low(0.51 mol/L),moderate(1.03 mol/L),and high(3.42 mol/L)NaCl concentrations and one control group under very low(0.05 mol/L)NaCl concentration and 3 transcriptome results that were deep sequenced and de novo assembled were obtained per group.Twelve high-quality RNA-seq libraries with 46585 upregulated and 47805 downregulated unigenes were found.Relative to the control,188 common differentially expressed genes(DEGs)were screened and divided into four clusters in expression pattern.Fifteen of them annotated in the significant enriched Gene Ontology(GO)and Kyoto Encyclopedia of Genes and Genomes(KEGG)were validated via qPCR.Their qPCR-based relative expression patterns were similar to their RNA-seq-based patterns.Two significant DEGs,the geranylgeranyl diphosphate synthase coding gene(1876-bp cDNA)and diacylglycerol O-acyltransferase coding gene(2968-bp cDNA),were cloned and analyzed in silico.The total lipid content,superoxide dismutase specific activity,and betacarotene content of D.salina_YC increased gradually with increasing salinity.In addition,the expression of 11 validated genes involved in fatty acid biosynthesis/degradation,active oxygen or carotenoid metabolisms showed significant changes.In addition,algal photochemical efficiency was diminished with increasing salinity,as well as the expression of 4 photosynthesis-related genes.These results could help clarify the molecular mechanisms underlying D.salina responses to the Yuncheng Salt Lake environment and lay a foundation for further utilization of this algal resource.
基金supported by Major scientific and technological projects of XPCC(2020KWZ-012)。
文摘Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development periods were collected for transcriptome analysis.High performance liquid chromatography analysis showed that CGA contents of KFP at 88,118 and 163 days after full bloom were(20.96±1.84),(12.01±0.91)and(7.16±0.41)mg/100 g,respectively,and decreased with the fruit development.Pears from these typical 3 periods were selected for de novo transcriptome assemble and 68059 unigenes were assembled from 444037960 clean reads.One‘phenylpropanoid biosynthesis’pathway including 57 unigenes,11 PALs,1 PTAL,64CLs,9 C4Hs,25 HCTs and 5 C3’Hs related to CGA biosynthesis was determined.It was found that the expression levels of 11 differentially expressed genes including 1 PAL,2 C4Hs,34CLs and 5 HCTs were consistent with the change of CGA content.Quantitative polymerase chain reaction analysis further showed that 8 unigenes involved in CGA biosynthesis were consistent with the RNA-seq data.These findings will provide a comprehensive understanding and valuable information on the genetic engineering and molecular breeding in KFP.
基金supported by the National Basic Research Program of China (Grant Nos. 2010CB945401, 2007CB108800)National Natural Science Foundation of China (Grant Nos. 30870575, 31071162,31000590)the Science and Technology Commission of Shanghai Municipality (Grant No. 11DZ2260300)
文摘De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line,respectively.We then determined an efficient way to yield an optimal overall assembly using three different strategies.We first assembled brain and cell line transcriptome using a single k-mer length.Next we tested a range of values of k-mer length and coverage cutoff in assembling.Lastly,we combined the assembled contigs from a range of k values to generate a final assembly.By comparing these assembly results,we found that using only one k-mer value for assembly is not enough to generate good assembly results,but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.
基金supported by Chinese Academy of Sciences"Strategic Priority Research Program"(XDA24040201)National Key Research and Development Program of China(2020YFE0202300)State Key Laboratory of Plant Genomics。
文摘The wild rice species in the genus Oryza harbor a large amount of genetic diversity that has been untapped for rice improvement.Pan-genomics has revolutionized genomic research in plants.However,rice pan-genomic studies so far have been limited mostly to cultivated accessions,with only a few close wild relatives.Advances in sequencing technologies have permitted the assembly of highquality rice genome sequences at low cost,making it possible to construct genus-level pan-genomes across all species.In this review,we summarize progress in current research on genetic and genomic resources in Oryza,and in sequencing and computational technologies used for rice genome and pangenome construction.For future work,we discuss the approaches and challenges in the construction of,and data access to,Oryza pan-genomes based on representative high-quality genome assemblies.The Oryza pan-genomes will provide a basis for the exploration and use of the extensive genetic diversity present in both cultivated and wild rice populations.
基金The work was supported by the National Natural Science Foundation of China(31400572)the Jiangsu Provincial Natural Science Foundation(BK20141039)National Natural Science Foundation of China(31300556).
文摘Soil salinization is a serious ecological problem worldwide and information regarding the salt tolerance mechanisms of Salix is scarce.To elucidate the dynamic changes in the molecular mechanisms of Salix under salt stress,we generated gene expression profiles and examined changes in the expression of those genes.RNA-Seq was used to produce six cDNA libraries constructed from the leaves of Salix ×jiangsuensis CL‘J2345’treated with NaCl for 0,2,6,12,24 and 48 h.In total,249 million clean reads were assembled into 12,739 unigenes,all of which were clustered into 10 profiles based on their temporal expression patterns.KEGG analysis revealed that as an early defense response,the biosynthesis pathways of cutin,suberin and wax,which are involved in cell wall structure,were activated beginning at 2 h.The expression of secondary metabolism genes,including those involved in the phenylpropanoid,flavonoid,stilbenoid,diarylheptanoid and gingerol pathways,peaked at 6 h and 24 h;the upregulated genes were mainly involved in plant hormone pathways and beta-alanine,galactose and betalain metabolism.We identified roles of key phytohormones and found ETH to be the major signaling molecule activating TFs at 12 h;ETH,ABA,IAA and SA were the key molecules at 24 h.Moreover,we found that the upregulated genes were associated with elevated levels of amino acids,sucrose,inositol,stress proteins and ROS-scavenging enzymes,contributing to the maintenance of water balance.This research constitutes the first detailed analysis of salt stress-related mechanisms in Salix and identifies potential targets for genetic manipulation to improve yields.
基金financially supported by the National Natural Science Foundation of China(Grant No.31801049)the Major Science and Technology Innovation Program of Shanghai Municipal Education Commission,China(Grant No.2019-01-07-00-01-E00059)the Shanghai Nanmulin Biotechnology Company Limited.
文摘The white-blotched river stingray(Potamotrygon leopoldi)is a cartilaginous fish native to the Xingu River,a tributary of the Amazon River system.As a rare freshwater-dwelling cartilaginous fish in the Potamotrygonidae family in which no member has the genome sequencing information available,P.leopoldi provides the evolutionary details in fish phylogeny,niche adaptation,and skeleton formation.In this study,we present its draft genome of 4.11 Gb comprising 16,227 contigs and 13,238 scaffolds,with contig N50 of 3937 kb and scaffold N50 of 5675 kb in size.Our analysis shows that P.leopoldi is a slow-evolving fish that diverged from elephant sharks about 96 million years ago.Moreover,two gene families related to the immune system(immunoglobulin heavy constant delta genes and T-cell receptor alpha/delta variable genes)exhibit expansion in P.leopoldi only.We also identified the Hox gene clusters in P.leopoldi and discovered that seven Hox genes shared by five representative fish species are missing in P.leopoldi.The RNA sequencing data from P.leopoldi and other three fish species demonstrate that fishes have a more diversified tissue expression spectrum when compared to mammals.Our functional studies suggest that lack of the gc gene encoding vitamin D-binding protein in cartilaginous fishes(both P.leopoldi and Callorhinchus milii)could partly explain the absence of hard bone in their endoskeleton.Overall,this genome resource provides new insights into the niche adaptation,body plan,and skeleton formation of P.leopoldi,as well as the genome evolution in cartilaginous fishes.
基金supported by the Wellcome Trust,the United Kingdom
文摘The revolution of genome sequencing is continuing after the successful secondgeneration sequencing (SGS) technology. The third-generation sequencing (TGS) technology, led by Pacific Biosciences (PacBio), is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT). MiniON identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MiniON has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assem- bly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.
基金The work was supported by the National Key Research and Development Project of China,China(no.2019YFD1000600)the Fundamental Research Funds for the Central Universities of Northwest A&F University,China(2452619041)Funds of Modern Agricultural Industrial Technology System(no.CARS-18).
文摘Mulberry(Morns spp.)is the sole plant consumed by the domesticated silkworm.However,the genome of domesticated mulberry has not yet been sequenced,and the ploidy level of this species remains unclear.Here,we report a high-quality,chromosome-level domesticated mulberry(Morus alba)genome.Analysis of genomic data and karyotype analyses confirmed that M.alba is a diploid with 28 chromosomes(2/7=2x=28).Population genomic analysis based on resequencing of 134 mulberry accessions classified domesticated mulberry into three geographical groups,namely,Taihu Basin of southeastern China(Hu mulberry),northern and southwestern China,and Japan.Hu mulberry had the lowest nucleotide diversity among these accessions and demonstrated obvious signatures of selection associated with environmental adaptation.Further phylogenetic analysis supports a previous proposal that multiple domesticated mulberry accessions previously classified as different species actually belong to one species.This study expands our understanding of genome evolution of the genus Morus and population structure of domesticated mulberry,which would facilitate mulberry breeding and improvement.
基金supported by the National Key R&D Program of China(2019YFD1000700/2019YFD1000702)the China Agricultural Research System(CARS-08-G3)+2 种基金the Key Research and Development Program of Hebei(21326305D)the Hebei Agriculture Research System(HBCT2018070203)the Hebei Talent Project.
文摘Mung bean is an economically important legume crop species that is used as a food,consumed as a vegetable,and used as an ingredient and even as a medicine.To explore the genomic diversity of mung bean,we assembled a high-quality reference genome(Vrad_JL7)that was479.35 Mb in size,with a contig N50 length of 10.34 Mb.A total of 40,125 protein-coding genes were annotated,representing96.9%of the genetic region.We also sequenced 217 accessions,mainly landraces and cultivars from China,and identified 2,229,343 high-quality single-nucleotide polymorphisms(SNPs).Population structure revealed that the Chinese accessions diverged into two groups and were distinct from non-Chinese lines.Genetic diversity analysis based on genomic data from 750 accessions in 23 countries supported the hypothesis that mung bean was first domesticated in south Asia and introduced to east Asia probably through the Silk Road.We constructed the first pan-genome of mung bean germplasm and assembled 287.73 Mb of non-reference sequences.Among the genes,83.1%were core genes and 16.9%were variable.Presence/absence variation(PAV)events of nine genes involved in the regulation of the photoperiodic flowering pathway were identified as being under selection during the adaptation process to promote early flowering in the spring.Genomewide association studies(GWASs)revealed 2,912 SNPs and 259 gene PAV events associated with 33 agronomic traits,including a SNP in the coding region of the SWEET10 homolog(jg24043)involved in crude starch content and a PAV event in a large fragment containing 11 genes for color-related traits.This high-quality reference genome and pan-genome will provide insights into mung bean breeding.
文摘Tartary buckwheat (Fagopyrum tataricum) is an important pseudocereal crop that is strongly adapted to growth in adverse environments. Its gluten-free grain contains complete proteins with a well-balanced composition of essential amino acids and is a rich source of beneficial phytochemicals that provide significant health benefits. Here, we report a high-quality, chromosome-scale Tartary buckwheat genome sequence of- 489.3 Mb that is assembled by combining whole-genome shotgun sequencing of both Illumina short reads and single-molecule real-time long reads, sequence tags of a large DNA insert fosmid library, Hi-C sequencing data, and BioNano genome maps. We annotated 33 366 high-confidence protein-coding genes based on expression evidence. Comparisons of the intra-genome with the sugar beet genome revealed an independent whole-genome duplication that occurred in the buckwheat lineage after they diverged from the common ancestor, which was not shared with rosids or asterids. The reference genome facilitated the identification of many new genes predicted to be involved in rutin biosynthesis and regulation, aluminum stress resistance, and in drought and cold stress responses. Our data suggest that Tartary buckwheat's ability to tolerate high levels of abiotic stress is attributed to the expansion of several gene families involved in signal transduction, gene regulation, and membrane transport. The availability of these genomic resources will facilitate the discovery of agronomically and nutritionally important genes and genetic improvement of Tartary buckwheat.
基金the Agricultural Science and Technology Innovation Program,the Elite Young Scientists Program of CAAS,the Science Technology and Innovation Committee of Shenzhen Municipality(KQJSCX20180323140312935,AGIS-ZDKY202004)the Dapeng New District Special Fund for Industrial Development(KY20150113)。
文摘High-quality rice reference genomes have accelerated the comprehensive identification of genome-wide variations and research on functional genomics and breeding.Tian-you-hua-zhan has been a leading hybrid in China over the past decade.Here,de novo genome assembly strategy optimization for the rice indica lines Huazhan(HZ)and Tianfeng(TF),including sequencing platforms,assembly pipelines and sequence depth,was carried out.The PacBio and Nanopore platforms for long-read se-quencing were utilized,with the Canu,wtdbg2,SMARTdenovo,Flye,Canu-wtdbg2,Canu-SMARTdenovo and Canu-Flye assemblers.The combination of PacBio and Canu was optimal,considering the contig N50 length,contig number,assembled genome size and polishing process.The assembled contigs were scaffolded with Hi-C data,resulting in two“golden quality”rice reference genomes,and evaluated using the scaffold N50,BUSCO,and LTR assembly index.Furthermore,42,625 and 41,815 non-transposable element genes were annotated for HZ and TF,respectively.Based on our assembly of HZ and TF,as well as Zhenshan97,Minghui63,Shuhui498 and 9311,comprehensive variations were identified using Nipponbare as a reference.The de novo assembly strategy for rice we optimized and the“golden quality”rice genomes we produced for HZ and TF will benefit rice genomics and breeding research,especially with respect to uncovering the genomic basis of the elite traits of HZ and TF.
基金supported by the institutional fund of the Department of Internal Medicine, University of Iowa, USA
文摘Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with dis- eases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Addition- ally, PacBio's sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.
基金supported by the National High Technology Research and Development Program of China(2015AA020104)the China Human Proteome Project(2014DFB30010)+1 种基金the National Science Foundation of China(31471239,to Leming Shi)the 111 Project(B13016)
文摘Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to obtain accurate and comprehensive results. Here we reviewed the strategies for improving diverse transcriptomic studies and the annotation of genetic variants based on RNA-seq data. Mapping RNA-seq reads to the genome and transcriptome represent two distinct methods for quantifying the expression of genes/transcripts. Besides the known genes annotated in current databases, many novel genes/transcripts(especially those long noncoding RNAs) still can be identified on the reference genome using RNA-seq. Moreover, owing to the incompleteness of current reference genomes, some novel genes are missing from them. Genome-guided and de novo transcriptome reconstruction are two effective and complementary strategies for identifying those novel genes/transcripts on or beyond the reference genome. In addition, integrating the genes of distinct databases to conduct transcriptomics and genetics studies can improve the results of corresponding analyses.
基金supported by the National Natural Science Foundation of China(31822052)the National Thousand Youth Talents Plan,Natural Science Foundation of China(31802027)Natural Science Basic Research Plan in Shaanxi Province of China(2019JQ002)。
文摘The Y chromosome plays key roles in male fertility and reflects the evolutionary history of paternal lineages.Here,we present a de novo genome assembly of the Hu sheep with the first draft assembly of ovine Y chromosome(o MSY),using nanopore sequencing and Hi-C technologies.The o MSY that we generated spans 10.6 Mb from which 775 Y-SNPs were identified by applying a large panel of whole genome sequences from worldwide sheep and wild Iranian mouflons.Three major paternal lineages(HY1a,HY1b and HY2)were defined across domestic sheep,of which HY2 was newly detected.Surprisingly,HY2 forms a monophyletic clade with the Iranian mouflons and is highly divergent from both HY1a and HY1b.Demographic analysis of Y chromosomes,mitochondrial and nuclear genomes confirmed that HY2 and the maternal counterpart of lineage C represented a distinct wild mouflon population in Iran that diverge from the direct ancestor of domestic sheep,the wild mouflons in Southeastern Anatolia.Our results suggest that wild Iranian mouflons had introgressed into domestic sheep and thereby introduced this Iranian mouflon specific lineage carrying HY2 to both East Asian and Africa sheep populations.