The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse...The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse data types,here we present the GSA family by expanding into a set of resources for raw data archive with different purposes,namely,GSA(https://ngdc.cncb.ac.cn/gsa/),GSA for Human(GSA-Human,https://ngdc.cncb.ac.cn/gsa-human/),and Open Archive for Miscellaneous Data(OMIX,https://ngdc.cncb.ac.cn/omix/).Compared with the 2017 version,GSA has been significantly updated in data model,online functionalities,and web interfaces.GSA-Human,as a new partner of GSA,is a data repository specialized in human genetics-related data with controlled access and security.OMIX,as a critical complement to the two resources mentioned above,is an open archive for miscellaneous data.Together,all these resources form a family of resources dedicated to archiving explosive data with diverse types,accepting data submissions from all over the world,and providing free open access to all publicly available data in support of worldwide research activities.展开更多
To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy...To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI)project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0)and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1)were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of PacBio,10×Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of 597 participants and identified 24.85 million(M)single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677T),we hypothesize that there exists a “comfort”zone for a high frequency of 677T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.展开更多
Since the proposal for pangenomic study, there have been a dozen software tools actively in use for pangenomic analysis. By the end of 2014, Panseq and the pan-genomes analysis pipeline(PGAP) ranked as the top two m...Since the proposal for pangenomic study, there have been a dozen software tools actively in use for pangenomic analysis. By the end of 2014, Panseq and the pan-genomes analysis pipeline(PGAP) ranked as the top two most popular packages according to cumulative citations of peerreviewed scientific publications. The functions of the software packages and tools, albeit variable among them, include categorizing orthologous genes, calculating pangenomic profiles, integrating gene annotations, and constructing phylogenies. As epigenomic elements are being gradually revealed in prokaryotes, it is expected that pangenomic databases and toolkits have to be extended to handle information of detailed functional annotations for genes and non-protein-coding sequences including non-coding RNAs, insertion elements, and conserved structural elements. To develop better bioinformatic tools, user feedback and integration of novel features are both of essence.展开更多
基金supported by grants from National Key R&D Program of China(Grant No.2017YFC0907502 to ZZ)Strategic Priority Research Program of Chinese Academy of Sciences(Grant Nos.XDB38060100 and XDB38030200 to YB+13 种基金XDB38050300 to WZXDB38030400 to JXXDA19050302 to ZZ)National Key R&D Program of China(Grant Nos.2016YFC0901603 to WZ2017YFC1201202 to YW2020YFC0847000 and 2018YFD1000505 to WZ2016YFE0206600 to YB)The 13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05 to YB)Genomics Data Center Construction of Chinese Academy of Sciences(Grant No.XXH-13514-0202 to YB)Open Biodiversity and Health Big Data Programme of the International Union of Biological Sciences to YBThe Professional Association of the Alliance of International Science Organizations(Grant No.ANSO-PA-2020-07 to YB)National Natural Science Foundation of China(Grant Nos.32030021 and 31871328 to ZZ)International Partnership Program of the Chinese Academy of Sciences(Grant No.153F11KYSB20160008 to ZZ)。
文摘The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse data types,here we present the GSA family by expanding into a set of resources for raw data archive with different purposes,namely,GSA(https://ngdc.cncb.ac.cn/gsa/),GSA for Human(GSA-Human,https://ngdc.cncb.ac.cn/gsa-human/),and Open Archive for Miscellaneous Data(OMIX,https://ngdc.cncb.ac.cn/omix/).Compared with the 2017 version,GSA has been significantly updated in data model,online functionalities,and web interfaces.GSA-Human,as a new partner of GSA,is a data repository specialized in human genetics-related data with controlled access and security.OMIX,as a critical complement to the two resources mentioned above,is an open archive for miscellaneous data.Together,all these resources form a family of resources dedicated to archiving explosive data with diverse types,accepting data submissions from all over the world,and providing free open access to all publicly available data in support of worldwide research activities.
基金supported by the grants of Key Program of the Chinese Academy of Sciences(Grant No.KJZD-EW-L14 awarded to CZ)the National Key R&D Program of China from the Ministry of Science and Technology of China(Grant No.2016YFB0201702 awarded to JX,as well as Grant Nos.2016YFC0901701 and 2018YFC0910700 awarded to XF)
文摘To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI)project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0)and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1)were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of PacBio,10×Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of 597 participants and identified 24.85 million(M)single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677T),we hypothesize that there exists a “comfort”zone for a high frequency of 677T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.
基金supported by the National High-tech R&D Program (863 Program Grant No. 2012AA020409) from theMinistry of Science and Technology of China+1 种基金the Key Program of the Chinese Academy of Sciences (Grant No. KSZD-EW-TZ-009-02)the National Natural Science Foundation of China (Grant Nos. 31471248 and 31271386)
文摘Since the proposal for pangenomic study, there have been a dozen software tools actively in use for pangenomic analysis. By the end of 2014, Panseq and the pan-genomes analysis pipeline(PGAP) ranked as the top two most popular packages according to cumulative citations of peerreviewed scientific publications. The functions of the software packages and tools, albeit variable among them, include categorizing orthologous genes, calculating pangenomic profiles, integrating gene annotations, and constructing phylogenies. As epigenomic elements are being gradually revealed in prokaryotes, it is expected that pangenomic databases and toolkits have to be extended to handle information of detailed functional annotations for genes and non-protein-coding sequences including non-coding RNAs, insertion elements, and conserved structural elements. To develop better bioinformatic tools, user feedback and integration of novel features are both of essence.