The ENCyclopedia Of DNA Elements (ENCODE) project is an international research consortium that aims to identify all functional elements in the human genome sequence. The second phase of the project comprised 1640 da...The ENCyclopedia Of DNA Elements (ENCODE) project is an international research consortium that aims to identify all functional elements in the human genome sequence. The second phase of the project comprised 1640 datasets from 147 different cell types, yielding a set of 30 publications across several journals. These data revealed that 80.4% of the human genome displays some functionality in at least one cell type. Many of these regulatory elements are physically asso- ciated with one another and further form a network or three-dimensional conformation to affect gene expression. These elements are also related to sequence variants associated with diseases or traits. All these findings provide us new insights into the organization and regulation of genes and genome, and serve as an expansive resource for understanding human health and disease.展开更多
To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy...To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI)project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0)and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1)were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of PacBio,10×Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of 597 participants and identified 24.85 million(M)single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677T),we hypothesize that there exists a “comfort”zone for a high frequency of 677T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.展开更多
Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new know...Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community.展开更多
Postzygotic mutations are acquired in normal tissues throughout an individual’s lifetime and hold clues for identifying mutagenic factors.Here,we investigated postzygotic mutation spectra of healthy individuals using...Postzygotic mutations are acquired in normal tissues throughout an individual’s lifetime and hold clues for identifying mutagenic factors.Here,we investigated postzygotic mutation spectra of healthy individuals using optimized ultra-deep exome sequencing of the time-series samples from the same volunteer as well as the samples from different individuals.In blood,sperm,and muscle cells,we resolved three common types of mutational signatures.Signatures A and B represent clocklike mutational processes,and the polymorphisms of epigenetic regulation genes influence the proportion of signature B in mutation profiles.Notably,signature C,characterized by C>T transitions at GpCpN sites,tends to be a feature of diverse normal tissues.Mutations of this type are likely to occur early during embryonic development,supported by their relatively high allelic frequencies,presence in multiple tissues,and decrease in occurrence with age.Almost none of the public datasets for tumors feature this signature,except for 19.6%of samples of clear cell renal cell carcinoma with increased activation of the hypoxia-inducible factor 1(HIF-1)signaling pathway.Moreover,the accumulation of signature C in the mutation profile was accelerated in a human embryonic stem cell line with drug-induced activation of HIF-1α.Thus,embryonic hypoxia may explain this novel signature across multiple normal tissues.Our study suggests that hypoxic condition in an early stage of embryonic development is a crucial factor inducing C>T transitions at GpCpN sites;and individuals’genetic background may also influence their postzygotic mutation profiles.展开更多
Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of...Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of biases, the strand-preference of genes, and the influence of background nucleotide composition variations. Using a dataset composed of 364 non-redundant bacterial genomes, we sought to illus- trate our current understanding of SGD. First, when we divided the collection of bacterial genomes into non-polC and polC groups according to their possession of DnaE isoforms that correlate closely with taxonomy, the SGD of the polC group stood out more sig- nificantly than that of the non-polC group. Second, when examining horizontal gene transfer, coupled with gene functional conservation (essentiality) and expressivity (level of expression), we realized that they all contributed to SGD. Third, we further demonstrated a weaker G-dominance on the leading strand of the non-polC group but strong purine dominance (both G and A) on the leading strand of the polC group. We propose that strand-biased nucleotide composition plays a decisive role for SGD since the polC-bearing genomes are not only AT-rich but also have pronounced purine-rich leading strands, and we believe that a special mutation spectrum that leads to a strong purine asymmetry and a strong strand-biased nucleotide composition coupled with functional selections for genes and their functions are both at work.展开更多
The era of brain science across the world The human brain is the most complex organ in the human body.It comprises billions of neurons and supporting cells in a complex network,managing everything from physical functi...The era of brain science across the world The human brain is the most complex organ in the human body.It comprises billions of neurons and supporting cells in a complex network,managing everything from physical functions to thoughts and feelings of humans.Dysfunctions of the complex network caused by both genetic and environmental factors result in many brain disorders.Several large brain projects have been launched,aiming to understand how the brain works in health and disease,including the Human Brain Project of the European Union,the Brain Research through Advancing Innovative Neurotechnologies(BRAIN)Initiative of the United States,and the Brain Mapping by Integrated Neurotechnologies for Disease Studies(Brain/MIND)project of Japan.展开更多
Gliomas are one of the most common types of brain cancers.Numerous efforts have been devoted to studying the mechanisms of glioma genesis and identifying biomarkers for diagnosis and treatment.To help further investig...Gliomas are one of the most common types of brain cancers.Numerous efforts have been devoted to studying the mechanisms of glioma genesis and identifying biomarkers for diagnosis and treatment.To help further investigations,we present a comprehensive database named GliomaDB.GliomaDB includes 21,086 samples from 4303 patients and integrates genomic,transcriptomic,epigenomic,clinical,and gene-drug association data regarding glioblastoma multiforme(GBM)and low-grade glioma(LGG)from The Cancer Genome Atlas(TCGA),Gene Expression Omnibus(GEO),the Chinese Glioma Genome Atlas(CGGA),the Memorial Sloan Kettering Cancer Center Integrated Mutation Profiling of Actionable Cancer Targets(MSK-IMPACT),the US Food and Drug Administration(FDA),and Pharm GKB.GliomaDB offers a user-friendly interface for two main types of functionalities.The first comprises queries of(i)somatic mutations,(ii)gene expression,(iii)microRNA(miRNA)expression,and(iv)DNA methylation.In addition,queries can be executed at the gene,region,and base level.Second,GliomaDB allows users to perform survival analysis,coexpression network visualization,multi-omics data visualization,and targeted drug recommendations based on personalized variations.GliomaDB bridges the gap between glioma genomics big data and the delivery of integrated information for end users,thus enabling both researchers and clinicians to effectively use publicly available data and empowering the progression of precision medicine in glioma.GliomaDB is freely accessible at http://bigd.big.ac.cn/glioma DB.展开更多
Erythropoiesis is a complex and sophisticated multi-stage process regulated by a variety of factors,including the transcription factor GATA1 and non-coding RNA.GATA1 is regarded as an essential transcriptional regulat...Erythropoiesis is a complex and sophisticated multi-stage process regulated by a variety of factors,including the transcription factor GATA1 and non-coding RNA.GATA1 is regarded as an essential transcriptional regulator promoting transcription of erythroidspecific genes—such as long non-coding RNAs(lncRNA).Here,we comprehensively screened lncRNAs that were potentially regulated by GATA1 in erythroid cells.We identified a novel lncRNA—PCED1B-AS1—and verified its role in promoting erythroid differentiation of K562 erythroid cells.We also predicted a model in which PCED1B-AS1 participates in erythroid differentiation via dynamic chromatin remodeling involving GATA1.The relationship between lncRNA and chromatin in the process of erythroid differentiation remains to be revealed,and in our study we have carried out preliminary explorations.展开更多
With the advances of genome-wide sequencing technologies and bioinformatics approaches, a large number of datasets of normal and malignant erythropoiesis have been gener- ated and made public to researchers around the...With the advances of genome-wide sequencing technologies and bioinformatics approaches, a large number of datasets of normal and malignant erythropoiesis have been gener- ated and made public to researchers around the world. Collection and integration of these datasets greatly facilitate basic research and clinical diagnosis and treatment of blood disorders. Here we provide a brief introduction of the most popular omics data resources of normal and malignant hematopoiesis, including some integrated web tools, to help users get better equipped to perform common analyses. We hope this review will promote the awareness and facilitate the usage of public展开更多
基金supported by grants from the Strategic Priority Research Program of Chinese Academy of Sciences on Stem Cell and Regenerative Medicine Research(Grant No.XDA01040405)National High Technology Research and Development Program of China(863 Program, GrantNo. 2012AA022502) to XF
文摘The ENCyclopedia Of DNA Elements (ENCODE) project is an international research consortium that aims to identify all functional elements in the human genome sequence. The second phase of the project comprised 1640 datasets from 147 different cell types, yielding a set of 30 publications across several journals. These data revealed that 80.4% of the human genome displays some functionality in at least one cell type. Many of these regulatory elements are physically asso- ciated with one another and further form a network or three-dimensional conformation to affect gene expression. These elements are also related to sequence variants associated with diseases or traits. All these findings provide us new insights into the organization and regulation of genes and genome, and serve as an expansive resource for understanding human health and disease.
基金supported by the grants of Key Program of the Chinese Academy of Sciences(Grant No.KJZD-EW-L14 awarded to CZ)the National Key R&D Program of China from the Ministry of Science and Technology of China(Grant No.2016YFB0201702 awarded to JX,as well as Grant Nos.2016YFC0901701 and 2018YFC0910700 awarded to XF)
文摘To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI)project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0)and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1)were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of PacBio,10×Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of 597 participants and identified 24.85 million(M)single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677T),we hypothesize that there exists a “comfort”zone for a high frequency of 677T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences,Stem Cell and Regenerative Medicine Research(Grant No.XDA01040405)the National High-tech R&D Program of China(863Program,2012AA022502)+1 种基金the National‘‘Twelfth FiveYear’’Plan for Science&Technology Support of China(2013BAI01B09) awarded to XFthe National Natural Science Foundation of China(Grant No.31471236)awarded to YL
文摘Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community.
基金supported by the grants from the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB13020500)the National Natural Science Foundation of China(NSFC)(Grant Nos.91131905,31471199,and 91631304)+3 种基金the Key Research Program of Chinese Academy of Sciences(Grant No.KJZD-EW-L14 to CZ)the NSFC(Grant Nos.31440057 and 31701081 to WC)the 111 Project(Grant No.B13003 to WC and DZ)the Innovation Promotion Association of Chinese Academy of Sciences(Grant Nos.2016098 to DZ and 2019103 to AC)。
文摘Postzygotic mutations are acquired in normal tissues throughout an individual’s lifetime and hold clues for identifying mutagenic factors.Here,we investigated postzygotic mutation spectra of healthy individuals using optimized ultra-deep exome sequencing of the time-series samples from the same volunteer as well as the samples from different individuals.In blood,sperm,and muscle cells,we resolved three common types of mutational signatures.Signatures A and B represent clocklike mutational processes,and the polymorphisms of epigenetic regulation genes influence the proportion of signature B in mutation profiles.Notably,signature C,characterized by C>T transitions at GpCpN sites,tends to be a feature of diverse normal tissues.Mutations of this type are likely to occur early during embryonic development,supported by their relatively high allelic frequencies,presence in multiple tissues,and decrease in occurrence with age.Almost none of the public datasets for tumors feature this signature,except for 19.6%of samples of clear cell renal cell carcinoma with increased activation of the hypoxia-inducible factor 1(HIF-1)signaling pathway.Moreover,the accumulation of signature C in the mutation profile was accelerated in a human embryonic stem cell line with drug-induced activation of HIF-1α.Thus,embryonic hypoxia may explain this novel signature across multiple normal tissues.Our study suggests that hypoxic condition in an early stage of embryonic development is a crucial factor inducing C>T transitions at GpCpN sites;and individuals’genetic background may also influence their postzygotic mutation profiles.
基金supported by grants from Knowledge Innovation Program of the Chinese Academy of Sciences(Grant No.KSCX2-EW-R-01-04)Natural Science Foundation of China(Grant No.90919024 and 30900831)+2 种基金the Ministry of Science and Technology of China as the National Science and Technology Key Project (Grant No.2008ZX10004-013)the Special Foundation Work Program(Grant No.2009FY120100)the National Basic Research Program(Grant No. 2011CB944100)
文摘Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of biases, the strand-preference of genes, and the influence of background nucleotide composition variations. Using a dataset composed of 364 non-redundant bacterial genomes, we sought to illus- trate our current understanding of SGD. First, when we divided the collection of bacterial genomes into non-polC and polC groups according to their possession of DnaE isoforms that correlate closely with taxonomy, the SGD of the polC group stood out more sig- nificantly than that of the non-polC group. Second, when examining horizontal gene transfer, coupled with gene functional conservation (essentiality) and expressivity (level of expression), we realized that they all contributed to SGD. Third, we further demonstrated a weaker G-dominance on the leading strand of the non-polC group but strong purine dominance (both G and A) on the leading strand of the polC group. We propose that strand-biased nucleotide composition plays a decisive role for SGD since the polC-bearing genomes are not only AT-rich but also have pronounced purine-rich leading strands, and we believe that a special mutation spectrum that leads to a strong purine asymmetry and a strong strand-biased nucleotide composition coupled with functional selections for genes and their functions are both at work.
基金supported by the National Key R&D Program of China (Grant Nos. 2016YFC0901700 and 2018YFC0910700)the Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant No. 2014085), China
文摘The era of brain science across the world The human brain is the most complex organ in the human body.It comprises billions of neurons and supporting cells in a complex network,managing everything from physical functions to thoughts and feelings of humans.Dysfunctions of the complex network caused by both genetic and environmental factors result in many brain disorders.Several large brain projects have been launched,aiming to understand how the brain works in health and disease,including the Human Brain Project of the European Union,the Brain Research through Advancing Innovative Neurotechnologies(BRAIN)Initiative of the United States,and the Brain Mapping by Integrated Neurotechnologies for Disease Studies(Brain/MIND)project of Japan.
基金supported by the National Key R&D Program of China(Grant Nos.2016YFC0901700,2016YFC0901603,2017YFC0907502,2017YFC0908402,and2017YFC0907405)the Key Research Program of the Chinese Academy of Sciences,China(Grant No.KJZD-EWL14)
文摘Gliomas are one of the most common types of brain cancers.Numerous efforts have been devoted to studying the mechanisms of glioma genesis and identifying biomarkers for diagnosis and treatment.To help further investigations,we present a comprehensive database named GliomaDB.GliomaDB includes 21,086 samples from 4303 patients and integrates genomic,transcriptomic,epigenomic,clinical,and gene-drug association data regarding glioblastoma multiforme(GBM)and low-grade glioma(LGG)from The Cancer Genome Atlas(TCGA),Gene Expression Omnibus(GEO),the Chinese Glioma Genome Atlas(CGGA),the Memorial Sloan Kettering Cancer Center Integrated Mutation Profiling of Actionable Cancer Targets(MSK-IMPACT),the US Food and Drug Administration(FDA),and Pharm GKB.GliomaDB offers a user-friendly interface for two main types of functionalities.The first comprises queries of(i)somatic mutations,(ii)gene expression,(iii)microRNA(miRNA)expression,and(iv)DNA methylation.In addition,queries can be executed at the gene,region,and base level.Second,GliomaDB allows users to perform survival analysis,coexpression network visualization,multi-omics data visualization,and targeted drug recommendations based on personalized variations.GliomaDB bridges the gap between glioma genomics big data and the delivery of integrated information for end users,thus enabling both researchers and clinicians to effectively use publicly available data and empowering the progression of precision medicine in glioma.GliomaDB is freely accessible at http://bigd.big.ac.cn/glioma DB.
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA16010602)the National Natural Science Foundation of China(81670109,81700097,81870097,81700116).
文摘Erythropoiesis is a complex and sophisticated multi-stage process regulated by a variety of factors,including the transcription factor GATA1 and non-coding RNA.GATA1 is regarded as an essential transcriptional regulator promoting transcription of erythroidspecific genes—such as long non-coding RNAs(lncRNA).Here,we comprehensively screened lncRNAs that were potentially regulated by GATA1 in erythroid cells.We identified a novel lncRNA—PCED1B-AS1—and verified its role in promoting erythroid differentiation of K562 erythroid cells.We also predicted a model in which PCED1B-AS1 participates in erythroid differentiation via dynamic chromatin remodeling involving GATA1.The relationship between lncRNA and chromatin in the process of erythroid differentiation remains to be revealed,and in our study we have carried out preliminary explorations.
基金supported by the National Key Research and Development Program of China(Grant No.2016YFC0901700)the National High-tech R&D Program of China(863 Program,Grant Nos.2015AA020101 and 2015AA020108)+1 种基金the National‘‘12th Five-Year Plan”for Science&Technology Support of China(Grant No.2013BAI01B09)the National Natural Science Foundation of China(Grant Nos.31471115 and 81670109)
文摘With the advances of genome-wide sequencing technologies and bioinformatics approaches, a large number of datasets of normal and malignant erythropoiesis have been gener- ated and made public to researchers around the world. Collection and integration of these datasets greatly facilitate basic research and clinical diagnosis and treatment of blood disorders. Here we provide a brief introduction of the most popular omics data resources of normal and malignant hematopoiesis, including some integrated web tools, to help users get better equipped to perform common analyses. We hope this review will promote the awareness and facilitate the usage of public