The mechanistic basis of cellulose biosynthesis in plants has gained ground during last decade or so.The isolation of plant cDNA clones encoding cotton homologs of the bacterial cellulose
Brassica oleracea has been developed into many important crops,including cabbage,kale,cauliflower,broccoli and so on.The genome and gene annotation of cabbage(cultivar JZS),a representative morphotype of B.oleracea,ha...Brassica oleracea has been developed into many important crops,including cabbage,kale,cauliflower,broccoli and so on.The genome and gene annotation of cabbage(cultivar JZS),a representative morphotype of B.oleracea,has been widely used as a common reference in biological research.Although its genome assembly has been updated twice,the current gene annotation still lacks information on untranslated regions(UTRs)and alternative splicing(AS).Here,we constructed a high-quality gene annotation(JZSv3)using a full-length transcriptome acquired by nanopore sequencing,yielding a total of 59452 genes and 75684 transcripts.Additionally,we re-analyzed the previously reported transcriptome data related to the development of different tissues and cold response using JZSv3 as a reference,and found that 3843 out of 11908 differentially expressed genes(DEGs)underwent AS during the development of different tissues and 309 out of 903 cold-related genes underwent AS in response to cold stress.Meanwhile,we also identified many AS genes,including BolLHCB5 and BolHSP70,that displayed distinct expression patterns within variant transcripts of the same gene,highlighting the importance of JZSv3 as a pivotal reference for AS analysis.Overall,JZSv3 provides a valuable resource for exploring gene function,especially for obtaining a deeper understanding of AS regulation mechanisms.展开更多
Populus alba × P.glandulosa clone 84 K,derived from South Korea,is widely cultivated in China and used as a model in the molecular research of woody plants because of hi gh gene transformation efficiency.Here,we ...Populus alba × P.glandulosa clone 84 K,derived from South Korea,is widely cultivated in China and used as a model in the molecular research of woody plants because of hi gh gene transformation efficiency.Here,we combined63-fold coverage Illumina short reads and 126-fold coverage PacBio long reads to assemble the genome.Due to the hi gh heterozygosity level at 2.1% estimated by k-mer analysis,we exploited TrioCanu for genome assembly.The PacBio clean subreads of P.alba × P.glandulosa were separated into two parts according to the similarities,compared with the parental genomes of P.alba and P.glandulosa.The two parts of the subreads were assembled to two sets of subgenomes comprising subgenome A(405.31 Mb,from P.alba)and subgenome G(376.05 Mb,from P.glandulosa) with the contig N50 size of 5.43 Mb and 2.15 Mb,respectively.A high-quality P.alba × P.glandulosa genome assembly was obtained.The genome size was 781.36 Mb with the contig N50 size of 3.66 Mb and the longest contig was 19.47 Mb.In addition,a total of 176.95 Mb(43.7%),152.37 Mb(40.5%)of repetitive elements were identified and a total of 38,701 and 38,449 protein-coding genes were predicted in subgenomes A and G,respectively.For functional annotation,96.98% of subgenome A and 96.96% of subgenome G genes were annotated with public databases.This de novo assembled genome will facilitate systematic and comprehensive study,such as multi-omics analysis,in the model tree P.alba X P.glandulosa.展开更多
Baccaurea motleyana(rambai)is underutilized fruits that are native to Malaysia,Indonesia and Thailand.In this study,a total of 54,779 unigenes identified from rambai transcriptome were used for simple sequence repeat(...Baccaurea motleyana(rambai)is underutilized fruits that are native to Malaysia,Indonesia and Thailand.In this study,a total of 54,779 unigenes identified from rambai transcriptome were used for simple sequence repeat(SSR)analysis by MIcroSAtellite(MISA).A total of 20,420 SSRs were found to be distributed within 37.27%of the total number of unigenes.Mononucleotide repeats represented the main type,accounting for 64.04%,followed by trinucleotide repeats(20.28%)and dinucleotide repeats(19.94%).Gene annotation to seven databases has a success ratio of 68.53%(National Center for Biotechnology Information(NCBI)protein sequences),53.68%(NCBI nucleotide sequences),27.43%(Kyoto Encyclopedia of Genes and Genome Ortholog),56.0%(SwissProt),52.44%(Protein family),53.99%(Gene Ontology)and 26.44%(Kluster of Orthologous Group).Further rambai SSRs were randomly selected and validated to B.motleyana(rambai),B.macrocarpa(tampoi),B.polyneura(jentik-jentik),B.ramiflora(pupor)and B.scortechinii(setambun).展开更多
Peptidases are essential for intracellular protein processing,signaling and homeostasis,physiological processes and for digestion of food.Moreover,peptidases are important biotechnological enzymes used in processes su...Peptidases are essential for intracellular protein processing,signaling and homeostasis,physiological processes and for digestion of food.Moreover,peptidases are important biotechnological enzymes used in processes such as industrial food processing,leather manufacturing and the washing industry.Identification of peptidases is a crucial step in their characterization but peptidase annotation is not a trivial task due to their large sequence diversity.In the present study short,conserved sequence profiles were generated for all peptidase families with more than four members in the comprehensive Merops peptidase database.The sequence profiles were combined with the Homology to Peptide Pattern(Hotpep)method for automatic annotation of peptidases.This method is a standalone software that annotates protease sequences to Merops family and subgroup and is suitable for large-scale sequence analysis.Compared to the Mammalian Degradome Database Hotpep-protease had an accuracy of 92%and a sensitivity of 96%for annotation of the human degradome.Annotation by commonly used methods(Blast and conserved domains)had an accuracy of 69%and a sensitivity of 78%.For fungal genomes,there were large differences between annotation with Hotpep-protease,Blast-and Hidden Markov Model-based annotation and the Merops annotation,which confirms the difficulty of large-scale peptidase annotation.Manual annotation indicated that Hotpep-protease had a positive prediction rate of 0.90 compared to a positive prediction rate of 0.67 for Blast search.Hence,Hotpep-protease is highly accurate method for fast and accurate annotation of peptidases.展开更多
Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four ...Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four complementary pipelines built on transcript expression profile,genetic sequence alignment,protein sequence alignment,and naīve probability.Triplet GO was tested on a large set of 5754 genes from 8 species(human,mouse,Arabidopsis,rat,fly,budding yeast,fission yeast,and nematoda)and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge(CAFA3).Experimental results show that Triplet GO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches.Detailed analyses show that the major advantage of Triplet GO lies in the coupling of a new triplet network-based profiling method with the feature space mapping technique,which can accurately recognize function patterns from transcript expression profiles.Meanwhile,the combination of multiple complementary models,especially those from transcript expression and protein-level alignments,improves the coverage and accuracy of the final GO annotation results.The standalone package and an online server of Triplet GO are freely available at https://zhanggroup.org/Triplet GO/.展开更多
The Malaysian mahseer(Tor tambroides),one of the most valuable freshwater fish in the world,is mainly targeted for human consumption.The mitogenomic data of this species is available to date,but the genomic informatio...The Malaysian mahseer(Tor tambroides),one of the most valuable freshwater fish in the world,is mainly targeted for human consumption.The mitogenomic data of this species is available to date,but the genomic information is still lacking.For the first time,we sequenced the whole genome of an adult fish on both Illumina and Nanopore platforms.The hybrid genome assembly had resulted in a sum of 1.23 Gb genomic sequence from the 44,726 contigs found with 44 kb N50 length and BUSCO genome completeness of 87.6%.Four types of SSRs had been detected and identified within the genome with a greater AT abundance than that of GC.Predicted protein sequences had been functionally annotated to public databases,namely GO,KEGG and COG.A maximum likelihood phylogenomic tree containing 52 Actinopterygii species and one Sarcopterygii species as outgroup was constructed,providing first insights into the genome-based evolutionary relationship of T.tambroides with other ray-finned fish.These data are crucial in facilitating the study of population genomics,species identification,morphological variations,and evolutionary biology,which are helpful in the conservation of this species.展开更多
Abstract Banksia is a significant element in vegetation of southwestern Australia, a biodiversity hotspot with global significance. In particular, Banksia hookeriana represents a species with significant economic and ...Abstract Banksia is a significant element in vegetation of southwestern Australia, a biodiversity hotspot with global significance. In particular, Banksia hookeriana represents a species with significant economic and ecological importance in the region. For better conservation and management, we reported an overview of transcriptome of B. hookeriana using RNA-seq and de novo assembly. We have generated a total of 202.7 million reads (18.91 billion of nucleotides) from four leaf sam- ples in four plants of B. hookeriana, and assembled 59,063 unigenes (average size = 1098 bp) through de novo transcriptomc assembly. Among them, 39,686 unigenes were annotated against the Swiss-Prot, Clusters of Orthologous Groups (COG), and NCBI non-redundant (NR) protein databases. We showed that there was approximately one single nucleotide polymorphism (SNP) per 5.6-7.1 kb in the transcriptome, and the ratio of transitional to transversional polymorphisms was approximately 1.82. We compared unigenes of B. hookeriana to those of Arabidopsis thaliana and Nelumbo nucifera through sequence homology, Gene Ontology (GO) annotation, and KEGG pathway analyses. The comparative analysis revealed that unigenes of B. hookeriana were closely related to those of N. nucifera. B. hookeriana, N. nucifera, and A. thaliana shared similar GO anno- tations but different distributions in KEGG pathways, indicating that B. hookeriana has adapted to dry-Mediterranean type shrublands via regulating expression of specific genes. In total 1927 potential simple sequence repeat (SSR) markers were discovered, which could be used in the genotype and genetic diversity studies of the Banksia genus. Our results provide valuable sequence resource for further study in Banksia.展开更多
文摘The mechanistic basis of cellulose biosynthesis in plants has gained ground during last decade or so.The isolation of plant cDNA clones encoding cotton homologs of the bacterial cellulose
基金supported by the National Natural Science Foundation of China (Grant Nos.31972411,31722048,and 31630068)the Central Public-interest Scientific Institution Basal Research Fund (Grant No.Y2022PT23)+1 种基金the Innovation Program of the Chinese Academy of Agricultural Sciences,and the Key Laboratory of Biology and Genetic Improvement of Horticultural Crops,Ministry of Agriculture and Rural Affairs,P.R.Chinasupported by NIFA,the Department of Agriculture,via UC-Berkeley,USA。
文摘Brassica oleracea has been developed into many important crops,including cabbage,kale,cauliflower,broccoli and so on.The genome and gene annotation of cabbage(cultivar JZS),a representative morphotype of B.oleracea,has been widely used as a common reference in biological research.Although its genome assembly has been updated twice,the current gene annotation still lacks information on untranslated regions(UTRs)and alternative splicing(AS).Here,we constructed a high-quality gene annotation(JZSv3)using a full-length transcriptome acquired by nanopore sequencing,yielding a total of 59452 genes and 75684 transcripts.Additionally,we re-analyzed the previously reported transcriptome data related to the development of different tissues and cold response using JZSv3 as a reference,and found that 3843 out of 11908 differentially expressed genes(DEGs)underwent AS during the development of different tissues and 309 out of 903 cold-related genes underwent AS in response to cold stress.Meanwhile,we also identified many AS genes,including BolLHCB5 and BolHSP70,that displayed distinct expression patterns within variant transcripts of the same gene,highlighting the importance of JZSv3 as a pivotal reference for AS analysis.Overall,JZSv3 provides a valuable resource for exploring gene function,especially for obtaining a deeper understanding of AS regulation mechanisms.
基金supported by grants CAFYBB2017ZY001 and TGB2016001 from Fundamental Research Funds of the Chinese Academy of Forestry。
文摘Populus alba × P.glandulosa clone 84 K,derived from South Korea,is widely cultivated in China and used as a model in the molecular research of woody plants because of hi gh gene transformation efficiency.Here,we combined63-fold coverage Illumina short reads and 126-fold coverage PacBio long reads to assemble the genome.Due to the hi gh heterozygosity level at 2.1% estimated by k-mer analysis,we exploited TrioCanu for genome assembly.The PacBio clean subreads of P.alba × P.glandulosa were separated into two parts according to the similarities,compared with the parental genomes of P.alba and P.glandulosa.The two parts of the subreads were assembled to two sets of subgenomes comprising subgenome A(405.31 Mb,from P.alba)and subgenome G(376.05 Mb,from P.glandulosa) with the contig N50 size of 5.43 Mb and 2.15 Mb,respectively.A high-quality P.alba × P.glandulosa genome assembly was obtained.The genome size was 781.36 Mb with the contig N50 size of 3.66 Mb and the longest contig was 19.47 Mb.In addition,a total of 176.95 Mb(43.7%),152.37 Mb(40.5%)of repetitive elements were identified and a total of 38,701 and 38,449 protein-coding genes were predicted in subgenomes A and G,respectively.For functional annotation,96.98% of subgenome A and 96.96% of subgenome G genes were annotated with public databases.This de novo assembled genome will facilitate systematic and comprehensive study,such as multi-omics analysis,in the model tree P.alba X P.glandulosa.
文摘Baccaurea motleyana(rambai)is underutilized fruits that are native to Malaysia,Indonesia and Thailand.In this study,a total of 54,779 unigenes identified from rambai transcriptome were used for simple sequence repeat(SSR)analysis by MIcroSAtellite(MISA).A total of 20,420 SSRs were found to be distributed within 37.27%of the total number of unigenes.Mononucleotide repeats represented the main type,accounting for 64.04%,followed by trinucleotide repeats(20.28%)and dinucleotide repeats(19.94%).Gene annotation to seven databases has a success ratio of 68.53%(National Center for Biotechnology Information(NCBI)protein sequences),53.68%(NCBI nucleotide sequences),27.43%(Kyoto Encyclopedia of Genes and Genome Ortholog),56.0%(SwissProt),52.44%(Protein family),53.99%(Gene Ontology)and 26.44%(Kluster of Orthologous Group).Further rambai SSRs were randomly selected and validated to B.motleyana(rambai),B.macrocarpa(tampoi),B.polyneura(jentik-jentik),B.ramiflora(pupor)and B.scortechinii(setambun).
文摘Peptidases are essential for intracellular protein processing,signaling and homeostasis,physiological processes and for digestion of food.Moreover,peptidases are important biotechnological enzymes used in processes such as industrial food processing,leather manufacturing and the washing industry.Identification of peptidases is a crucial step in their characterization but peptidase annotation is not a trivial task due to their large sequence diversity.In the present study short,conserved sequence profiles were generated for all peptidase families with more than four members in the comprehensive Merops peptidase database.The sequence profiles were combined with the Homology to Peptide Pattern(Hotpep)method for automatic annotation of peptidases.This method is a standalone software that annotates protease sequences to Merops family and subgroup and is suitable for large-scale sequence analysis.Compared to the Mammalian Degradome Database Hotpep-protease had an accuracy of 92%and a sensitivity of 96%for annotation of the human degradome.Annotation by commonly used methods(Blast and conserved domains)had an accuracy of 69%and a sensitivity of 78%.For fungal genomes,there were large differences between annotation with Hotpep-protease,Blast-and Hidden Markov Model-based annotation and the Merops annotation,which confirms the difficulty of large-scale peptidase annotation.Manual annotation indicated that Hotpep-protease had a positive prediction rate of 0.90 compared to a positive prediction rate of 0.67 for Blast search.Hence,Hotpep-protease is highly accurate method for fast and accurate annotation of peptidases.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.62072243 and 61772273 to Dong-Jun Yu)the Natural Science Foundation of Jiangsu,China(Grant No.BK20201304 to Dong-Jun Yu)+7 种基金the Foundation of National Defense Key Laboratory of Science and Technology,China(Grant No.JZX7Y202001SY000901 to DongJun Yu)the China Scholarship Council(Grant No.201906840041 to Yi-Heng Zhu)the National Institute of Environmental Health Sciences,USA(Grant No.P30ES017885 to Gilbert S.Omenn)the National Cancer Institute,USA(Grant No.U24CA210967 to Gilbert S.Omenn)the National Institute of General Medical Sciences,USA(Grant Nos.GM136422 and S10OD026825 to Yang Zhang)the National Institute of Allergy and Infectious Diseases,USA(Grant No.AI134678 to Peter L.Freddolino and Yang Zhang)the National Science Foundation,USA(Grant Nos.IIS1901191,DBI2030790,and MTM2025426 to Yang Zhang)used the Extreme Science and Engineering Discovery Environment(XSEDE),which is supported by the National Science Foundation,USA(Grant No.ACI1548562)。
文摘Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four complementary pipelines built on transcript expression profile,genetic sequence alignment,protein sequence alignment,and naīve probability.Triplet GO was tested on a large set of 5754 genes from 8 species(human,mouse,Arabidopsis,rat,fly,budding yeast,fission yeast,and nematoda)and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge(CAFA3).Experimental results show that Triplet GO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches.Detailed analyses show that the major advantage of Triplet GO lies in the coupling of a new triplet network-based profiling method with the feature space mapping technique,which can accurately recognize function patterns from transcript expression profiles.Meanwhile,the combination of multiple complementary models,especially those from transcript expression and protein-level alignments,improves the coverage and accuracy of the final GO annotation results.The standalone package and an online server of Triplet GO are freely available at https://zhanggroup.org/Triplet GO/.
基金This work was fully funded by Sarawak Research and Development Council through the Research Initiation Grant Scheme with grant number RDCRG/RIF/2019/13 awarded to H.H.Chung.
文摘The Malaysian mahseer(Tor tambroides),one of the most valuable freshwater fish in the world,is mainly targeted for human consumption.The mitogenomic data of this species is available to date,but the genomic information is still lacking.For the first time,we sequenced the whole genome of an adult fish on both Illumina and Nanopore platforms.The hybrid genome assembly had resulted in a sum of 1.23 Gb genomic sequence from the 44,726 contigs found with 44 kb N50 length and BUSCO genome completeness of 87.6%.Four types of SSRs had been detected and identified within the genome with a greater AT abundance than that of GC.Predicted protein sequences had been functionally annotated to public databases,namely GO,KEGG and COG.A maximum likelihood phylogenomic tree containing 52 Actinopterygii species and one Sarcopterygii species as outgroup was constructed,providing first insights into the genome-based evolutionary relationship of T.tambroides with other ray-finned fish.These data are crucial in facilitating the study of population genomics,species identification,morphological variations,and evolutionary biology,which are helpful in the conservation of this species.
基金supported by Australian Research Council(Grant No.DP130103029)
文摘Abstract Banksia is a significant element in vegetation of southwestern Australia, a biodiversity hotspot with global significance. In particular, Banksia hookeriana represents a species with significant economic and ecological importance in the region. For better conservation and management, we reported an overview of transcriptome of B. hookeriana using RNA-seq and de novo assembly. We have generated a total of 202.7 million reads (18.91 billion of nucleotides) from four leaf sam- ples in four plants of B. hookeriana, and assembled 59,063 unigenes (average size = 1098 bp) through de novo transcriptomc assembly. Among them, 39,686 unigenes were annotated against the Swiss-Prot, Clusters of Orthologous Groups (COG), and NCBI non-redundant (NR) protein databases. We showed that there was approximately one single nucleotide polymorphism (SNP) per 5.6-7.1 kb in the transcriptome, and the ratio of transitional to transversional polymorphisms was approximately 1.82. We compared unigenes of B. hookeriana to those of Arabidopsis thaliana and Nelumbo nucifera through sequence homology, Gene Ontology (GO) annotation, and KEGG pathway analyses. The comparative analysis revealed that unigenes of B. hookeriana were closely related to those of N. nucifera. B. hookeriana, N. nucifera, and A. thaliana shared similar GO anno- tations but different distributions in KEGG pathways, indicating that B. hookeriana has adapted to dry-Mediterranean type shrublands via regulating expression of specific genes. In total 1927 potential simple sequence repeat (SSR) markers were discovered, which could be used in the genotype and genetic diversity studies of the Banksia genus. Our results provide valuable sequence resource for further study in Banksia.