生命组学大数据是国家重要基础性、战略性资源,对支撑生命科学基础研究和应用创新、推动生物经济创新发展、维护国家安全具有重要意义。随着数据规模的不断增长,生命组学大数据的安全管理问题逐渐凸显。国家基因组科学数据中心(National...生命组学大数据是国家重要基础性、战略性资源,对支撑生命科学基础研究和应用创新、推动生物经济创新发展、维护国家安全具有重要意义。随着数据规模的不断增长,生命组学大数据的安全管理问题逐渐凸显。国家基因组科学数据中心(National Genomics Data Center,NGDC)面向我国人口健康和社会可持续发展的重大战略需求,建立了生命与健康大数据汇交存储、安全管理、开放共享与整合挖掘研究体系,形成了一系列数据安全管理的制度和措施。本文聚焦于生命组学大数据全生命周期的安全管理问题,探讨生命组学大数据安全管理框架,全面分析在数据汇交、存储、管理、共享全生命周期中涉及的安全管理内容,并总结了NGDC在生命组学大数据安全管理方面的成效。最后,本文展望了生命组学大数据安全管理的发展方向,包括完善数据分级分类制度、提升数据分级安全管理技术和加强数据异地灾备建设,以期实现生命组学大数据的安全管理与可持续发展。展开更多
Brown algae (Chromista, Ochrophyta, Phaeophyceae) are a large group of multicellular algae that play im-portant roles in the ocean's ecosystem and biodiversity. However, poor molecular bases for studying their phyl...Brown algae (Chromista, Ochrophyta, Phaeophyceae) are a large group of multicellular algae that play im-portant roles in the ocean's ecosystem and biodiversity. However, poor molecular bases for studying their phylogenetic evolutions and novel metabolic characteristics have hampered progress in the field. In this study, we sequenced the de novo transcriptome of 18 major species of brown algae in China, covering six orders and seven families, using the high-throughput sequencing platform Illumina HiSeq 2000. From the transcriptome data of these 18 species and publicly available genome data of Ectocarpus siliculosus and Phaeodactylum tricornutum, we identified 108 nuclear-generated orthologous genes and clarified the phy-logenetic relationships among these brown algae based on a multigene method. These brown algae could be separated into two clades:Clade Ishigeales-Dictyotales and Clade Ectocarpales-Laminariales-Desmares-tiale-Fucales. The former was at the base of the phylogenetic tree, indicating its early divergence, while the latter was divided into two branches, with Order Fucales diverging from Orders Ectocarpales, Laminariales, and Desmarestiale. In our analysis of taxonomy-contentious species, Sargassum fusiforme and Saccharina sculpera were found to be closely related to genera Sargassum and Saccharina, respectively, while Petalonia fascia showed possible relation to genus Scytosiphon. The study provided molecular evidence for the phylo-genetic taxonomy of brown algae.展开更多
面向我国人口健康和社会可持续发展的重大战略需求,国家基因组科学数据中心(National Genomics Data Center,NGDC)自2019年成立以来,已初步建成具有自主知识产权、安全可控、涵盖领域广的多维组学数据汇交、存储、管理和共享体系,涵盖...面向我国人口健康和社会可持续发展的重大战略需求,国家基因组科学数据中心(National Genomics Data Center,NGDC)自2019年成立以来,已初步建成具有自主知识产权、安全可控、涵盖领域广的多维组学数据汇交、存储、管理和共享体系,涵盖基础组学数据资源、国家人类遗传资源、重要战略生物资源、生物安全资源以及生物信息分析工具和平台等,为人口健康、公共安全、育种改良、生物多样性等相关研究提供重要资源和参考信息.截至目前,NGDC已存储和管理27.6 PB的数据量,数据编号被Springer Nature,Elsevier,Wiley,Taylor&Francis等全球主要出版集团推荐或认可.尽管NGDC已连续六年被本领域国际权威期刊《核酸研究》称为与美国NCBI、欧洲EBI齐名的国际主要生物数据中心,但与国际一流数据中心仍存在一定差距.展望未来,NGDC将重点聚焦于数据智能审编、数据融合检索、生物大数据云平台、前沿算法工具等,同时在经费争取、人才培养和国际合作方面加大工作力度,建成国际领先的基因组科学数据中心,支撑我国生命与健康科学领域的科技创新发展和自立自强.展开更多
All eukaryotic genomes have genes with introns in variable sizes.As far as spliceosomal introns are concerned,there are at least three basic parameters to stratify introns across diverse eukaryotic taxa:size,number,an...All eukaryotic genomes have genes with introns in variable sizes.As far as spliceosomal introns are concerned,there are at least three basic parameters to stratify introns across diverse eukaryotic taxa:size,number,and sequence context.The number parameter is highly variable in lower eukaryotes,especially among protozoan and fungal species,which ranges from less than4%to 78%of the genes.Over greater evolutionary time scales,the number parameter undoubtedly increases as observed in higher plants and higher vertebrates,reaching greater than 12.5 exons per gene in average among mammalian genomes.The size parameter is more complex,where multiple modes appear at work.Aside from intronless genes,there are three other types of intron-containing genes:half-sized,minimal,and size-expandable introns.The half-sized introns have only been found in a limited number of genomes among protozoan and fungal lineages and the other two types are prevalent in all animal and plant genomes.Among the size-expandable introns,the sizes of plant introns are expansion-limited in that the large introns exceeding 1000 bp are fewer in numbers and transposon-free as compared to the large introns among animals,where the larger introns are filled with transposable elements and appear expansion-flexible,reaching several kilobasepairs(kbp)and even thousands of kbp in size.Most of the intron parameters can be studied as signatures of the specific splicing machineries of different eukaryotic lineages and are highly relevant to the regulation of gene expression and functionality.In particular,the transcription-splicing-export coupling of eukaryotic intron dispensing leads to a working hypothesis that all intron parameters are evolved to be efficient and function-related in processing and routing the spliced transcripts.展开更多
An organ unique to mammals, the mammary gland develops 90% of its mass after birth and experiences the pregnancy-lactation-involution cycle (PL cycle) during reproduction. To understand mammogenesis at the transcrip...An organ unique to mammals, the mammary gland develops 90% of its mass after birth and experiences the pregnancy-lactation-involution cycle (PL cycle) during reproduction. To understand mammogenesis at the transcriptomic level and using a ribo-minus RNA-seq protocol, we acquired greater than 50 million reads each for the mouse mammary gland during pregnancy (day 12 of pregnancy), lactation (day 14 of lactation), and involution (day 7 of involution). The pregnancy-, lacta- tion- and involution-related sequencing reads were assembled into 17344, 10160, and 13739 protein-coding transcripts and 1803, 828, and 1288 non-coding RNAs (ncRNAs), respectively. Differentially expressed genes (DEGs) were defined in the three samples, which comprised 4843 DEGs (749 up-regulated and 4094 down-regulated) from pregnancy to lactation and 4926 DEGs (4706 up-regulated and 220 down-regulated) from lactation to involution. Besides the obvious and substantive up- and down-regulation of the DEGs, we observe that lysosomal enzymes were highly expressed and that their expression coin- cided with milk secretion. Further analysis of transcription factors such as Trpsl, Gtf2i, Tcf712, Nuprl, Vdr, Rbl, and Aebpl, and ncRNAs such as mir-125b, Let7, mir-146a, and mir-15 has enabled us to identify key regulators in mammary gland de- velopment and the PL cycle.展开更多
文摘生命组学大数据是国家重要基础性、战略性资源,对支撑生命科学基础研究和应用创新、推动生物经济创新发展、维护国家安全具有重要意义。随着数据规模的不断增长,生命组学大数据的安全管理问题逐渐凸显。国家基因组科学数据中心(National Genomics Data Center,NGDC)面向我国人口健康和社会可持续发展的重大战略需求,建立了生命与健康大数据汇交存储、安全管理、开放共享与整合挖掘研究体系,形成了一系列数据安全管理的制度和措施。本文聚焦于生命组学大数据全生命周期的安全管理问题,探讨生命组学大数据安全管理框架,全面分析在数据汇交、存储、管理、共享全生命周期中涉及的安全管理内容,并总结了NGDC在生命组学大数据安全管理方面的成效。最后,本文展望了生命组学大数据安全管理的发展方向,包括完善数据分级分类制度、提升数据分级安全管理技术和加强数据异地灾备建设,以期实现生命组学大数据的安全管理与可持续发展。
基金The National Natural Science Foundation of China under contract Nos 31140070,31271397 and 41206116the algal transcrip-tome sequencing was supported by 1KP Project(www.onekp.com)
文摘Brown algae (Chromista, Ochrophyta, Phaeophyceae) are a large group of multicellular algae that play im-portant roles in the ocean's ecosystem and biodiversity. However, poor molecular bases for studying their phylogenetic evolutions and novel metabolic characteristics have hampered progress in the field. In this study, we sequenced the de novo transcriptome of 18 major species of brown algae in China, covering six orders and seven families, using the high-throughput sequencing platform Illumina HiSeq 2000. From the transcriptome data of these 18 species and publicly available genome data of Ectocarpus siliculosus and Phaeodactylum tricornutum, we identified 108 nuclear-generated orthologous genes and clarified the phy-logenetic relationships among these brown algae based on a multigene method. These brown algae could be separated into two clades:Clade Ishigeales-Dictyotales and Clade Ectocarpales-Laminariales-Desmares-tiale-Fucales. The former was at the base of the phylogenetic tree, indicating its early divergence, while the latter was divided into two branches, with Order Fucales diverging from Orders Ectocarpales, Laminariales, and Desmarestiale. In our analysis of taxonomy-contentious species, Sargassum fusiforme and Saccharina sculpera were found to be closely related to genera Sargassum and Saccharina, respectively, while Petalonia fascia showed possible relation to genus Scytosiphon. The study provided molecular evidence for the phylo-genetic taxonomy of brown algae.
基金supported by the National Natural Science Foundation of China(31101063,31271386)National Basic Research Program of China(2010CB126604,2011CB944100,2011CB944101)
文摘All eukaryotic genomes have genes with introns in variable sizes.As far as spliceosomal introns are concerned,there are at least three basic parameters to stratify introns across diverse eukaryotic taxa:size,number,and sequence context.The number parameter is highly variable in lower eukaryotes,especially among protozoan and fungal species,which ranges from less than4%to 78%of the genes.Over greater evolutionary time scales,the number parameter undoubtedly increases as observed in higher plants and higher vertebrates,reaching greater than 12.5 exons per gene in average among mammalian genomes.The size parameter is more complex,where multiple modes appear at work.Aside from intronless genes,there are three other types of intron-containing genes:half-sized,minimal,and size-expandable introns.The half-sized introns have only been found in a limited number of genomes among protozoan and fungal lineages and the other two types are prevalent in all animal and plant genomes.Among the size-expandable introns,the sizes of plant introns are expansion-limited in that the large introns exceeding 1000 bp are fewer in numbers and transposon-free as compared to the large introns among animals,where the larger introns are filled with transposable elements and appear expansion-flexible,reaching several kilobasepairs(kbp)and even thousands of kbp in size.Most of the intron parameters can be studied as signatures of the specific splicing machineries of different eukaryotic lineages and are highly relevant to the regulation of gene expression and functionality.In particular,the transcription-splicing-export coupling of eukaryotic intron dispensing leads to a working hypothesis that all intron parameters are evolved to be efficient and function-related in processing and routing the spliced transcripts.
基金supported by grant from Ministry of Science and Technology of China (2011CB944100,2011CB944101)
文摘An organ unique to mammals, the mammary gland develops 90% of its mass after birth and experiences the pregnancy-lactation-involution cycle (PL cycle) during reproduction. To understand mammogenesis at the transcriptomic level and using a ribo-minus RNA-seq protocol, we acquired greater than 50 million reads each for the mouse mammary gland during pregnancy (day 12 of pregnancy), lactation (day 14 of lactation), and involution (day 7 of involution). The pregnancy-, lacta- tion- and involution-related sequencing reads were assembled into 17344, 10160, and 13739 protein-coding transcripts and 1803, 828, and 1288 non-coding RNAs (ncRNAs), respectively. Differentially expressed genes (DEGs) were defined in the three samples, which comprised 4843 DEGs (749 up-regulated and 4094 down-regulated) from pregnancy to lactation and 4926 DEGs (4706 up-regulated and 220 down-regulated) from lactation to involution. Besides the obvious and substantive up- and down-regulation of the DEGs, we observe that lysosomal enzymes were highly expressed and that their expression coin- cided with milk secretion. Further analysis of transcription factors such as Trpsl, Gtf2i, Tcf712, Nuprl, Vdr, Rbl, and Aebpl, and ncRNAs such as mir-125b, Let7, mir-146a, and mir-15 has enabled us to identify key regulators in mammary gland de- velopment and the PL cycle.