To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy...To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI)project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0)and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1)were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of PacBio,10×Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of 597 participants and identified 24.85 million(M)single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677T),we hypothesize that there exists a “comfort”zone for a high frequency of 677T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.展开更多
COVID-19 has swept globally and Pakistan is no exception.To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan,we performed the largest genomic epidemiology study of COVID-19 in Paki...COVID-19 has swept globally and Pakistan is no exception.To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan,we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1,2020.We identified a total of 347 mutated positions,31 of which were over-represented in Pakistan.Meanwhile,we found over 1000 intra-host single-nucleotide variants(iSNVs).Several of them occurred concurrently,indicating possible interactions among them or coevolution.Some of the high-frequency iSNVs in Pakistan were not observed in the global population,suggesting strong purifying selections.The genomic epidemiology revealed five distinctive spreading clusters.The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure,indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation(G8371T in ORF1ab)of this cluster.Furthermore,28 putative international introductions were identified,several of which are consistent with the epidemiological investigations.In all,this study has inferred the possible pathways of introductions and transmissions of SARS-CoV-2 in Pakistan,which could aid ongoing and future viral surveillance and COVID-19 control.展开更多
基金supported by the grants of Key Program of the Chinese Academy of Sciences(Grant No.KJZD-EW-L14 awarded to CZ)the National Key R&D Program of China from the Ministry of Science and Technology of China(Grant No.2016YFB0201702 awarded to JX,as well as Grant Nos.2016YFC0901701 and 2018YFC0910700 awarded to XF)
文摘To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI)project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0)and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1)were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of PacBio,10×Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of 597 participants and identified 24.85 million(M)single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677T),we hypothesize that there exists a “comfort”zone for a high frequency of 677T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.
基金supported by grants from the National Key R&D Program of China(Grant Nos.2021YFC0863300,2020YFC0848900,and 2016YFE0206600)the National Natural Science Foundation of China(Grant No.82161148009)+3 种基金the Strategic Priority Research Program of Chinese Academy of Sciences,China(Grant Nos.XDA19090116 and XDB38060100)the Open Biodiversity and Health Big Data Programme of International Union of Biological Sciences,International Partnership Program of Chinese Academy of Sciences(Grant No.153F11KYSB20160008)the Professional Association of the Alliance of International Science Organizations(Grant No.ANSO-PA-2020-07)the Youth Innovation Promotion Association of Chinese Academy of Sciences(Grant No.2017141)。
文摘COVID-19 has swept globally and Pakistan is no exception.To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan,we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1,2020.We identified a total of 347 mutated positions,31 of which were over-represented in Pakistan.Meanwhile,we found over 1000 intra-host single-nucleotide variants(iSNVs).Several of them occurred concurrently,indicating possible interactions among them or coevolution.Some of the high-frequency iSNVs in Pakistan were not observed in the global population,suggesting strong purifying selections.The genomic epidemiology revealed five distinctive spreading clusters.The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure,indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation(G8371T in ORF1ab)of this cluster.Furthermore,28 putative international introductions were identified,several of which are consistent with the epidemiological investigations.In all,this study has inferred the possible pathways of introductions and transmissions of SARS-CoV-2 in Pakistan,which could aid ongoing and future viral surveillance and COVID-19 control.