Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our ...Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our observation in higher eukaryotes and identified a similar periodicity of 175 nt in length in Caenorhabditis elegans. In the process of defining the sequence compositional characteristics, we found that the 10.5-nt periodicity, the sequence signature of DNA double helix, may not be sufficient for cross-nucleosome positioning but provides essential guiding rails to facilitate positioning. We further dissected nucleosome-protected sequences and identified a strong positive purine (AG) gradient from the 5'-end to the 3"-end, and also learnt that the nucleosome-enriched regions are GC-rich as compared to the nucleosome-free sequences as purine content is positively correlated with GC content. Sequence characterization allowed us to develop a hidden Markov model (HMM) algorithm for decoding nucleosome positioning computationally, and based on a set of training data from the fifth chromosome of C. elegans, our algorithm predicted 60%-70% of the well-positioned nucleosomes, which is 15%-20% higher than random positioning. We concluded that nucleosomes are not randomly positioned on DNA sequences and yet bind to different genome regions with variable stability, well-positioned nucleosomes leave sequence signatures on DNA, and statistical positioning of nucleosomes across genome can be decoded computationally based on these sequence signatures.展开更多
Numerous studies of relationship between epigenomic features have focused on their strong correlation across the genome,likely because such relationship can be easily identified by many established methods for correla...Numerous studies of relationship between epigenomic features have focused on their strong correlation across the genome,likely because such relationship can be easily identified by many established methods for correlation analysis.However,two features with little correlation may still colocalize at many genomic sites to implement important functions.There is no bioinformatic tool for researchers to specifically identify such feature pairs.Here,we develop a method to identify feature pairs in which two features have maximal colocalization minimal correlation(MACMIC)across the genome.By MACMIC analysis of 3306 feature pairs in 16 human cell types,we reveal a dual role of CCCTC-binding factor(CTCF)in epigenetic regulation of cell identity genes.Although super-enhancers are associated with activation of target genes,only a subset of super-enhancers colocalized with CTCF regulate cell identity genes.At super-enhancers colocalized with CTCF,CTCF is required for the active marker H3 K27 ac in cell types requiring the activation,and also required for the repressive marker H3 K27 me3 in other cell types requiring repression.Our work demonstrates the biological utility of the MACMIC analysis and reveals a key role for CTCF in epigenetic regulation of cell identity.The code for MACMIC is available at https://github.com/bxia888/MACMIC.展开更多
Transposons are sequence elements widely distributed among genomes of all three kingdoms of life, providing genomic changes and playing significant roles in genome evolution. Trichomonas vaginalis is an excellent mode...Transposons are sequence elements widely distributed among genomes of all three kingdoms of life, providing genomic changes and playing significant roles in genome evolution. Trichomonas vaginalis is an excellent model system for transposon study since its genome (- 160 Mb) has been sequenced and is composed of - 65% transposons and other repetitive elements. In this study, we primarily report the identification of Kolobok-type transposons (termed tvBac) in T. vaginalis and the results of transposase sequence analysis. We categorized 24 novel subfamilies of the Kolobok element, including one autonomous subfamily and 23 non-autonomous subfamilies. We also identified a novel H2CH motif in tvBac transposases based on multiple sequence alignment. In addition, we supposed that tvBac and Mutator transposons may have evolved independently from a common ancestor according to our phylogenetic analysis. Our results provide basic information for the understanding of the function and evolution of tvBac transposons in particular and other related transposon families in general.展开更多
The breadth of the enrichment site for post-translational trimethylation of histone H3 at lysine 4 (H3K4me3) on chromatin has attracted great attention recently. H3K4me3, an extensively-studied histone modification,...The breadth of the enrichment site for post-translational trimethylation of histone H3 at lysine 4 (H3K4me3) on chromatin has attracted great attention recently. H3K4me3, an extensively-studied histone modification, is reported to promote gene transcription by directing preinitiation complex assembly through interaction with effector proteins, e.g.,展开更多
基金supported by the National Basic Re-search Program (973 Program) from the Ministry of ScienceTechnology of the People’s Republic of China (2006CB910404 to JY)
文摘Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our observation in higher eukaryotes and identified a similar periodicity of 175 nt in length in Caenorhabditis elegans. In the process of defining the sequence compositional characteristics, we found that the 10.5-nt periodicity, the sequence signature of DNA double helix, may not be sufficient for cross-nucleosome positioning but provides essential guiding rails to facilitate positioning. We further dissected nucleosome-protected sequences and identified a strong positive purine (AG) gradient from the 5'-end to the 3"-end, and also learnt that the nucleosome-enriched regions are GC-rich as compared to the nucleosome-free sequences as purine content is positively correlated with GC content. Sequence characterization allowed us to develop a hidden Markov model (HMM) algorithm for decoding nucleosome positioning computationally, and based on a set of training data from the fifth chromosome of C. elegans, our algorithm predicted 60%-70% of the well-positioned nucleosomes, which is 15%-20% higher than random positioning. We concluded that nucleosomes are not randomly positioned on DNA sequences and yet bind to different genome regions with variable stability, well-positioned nucleosomes leave sequence signatures on DNA, and statistical positioning of nucleosomes across genome can be decoded computationally based on these sequence signatures.
基金supported in part by the grants from National Institutes of Health(NIH)(Grant Nos.R01GM125632 to KC,R01HL133254 and R01HL148338 to JPC and KC,R01CA207098 and R01CA207109 to ML)QC is supported by the U.S.Department of Defense(Grant Nos.W81XWH17-1-0357 and W81XWH-19-1-0563)+2 种基金the American Cancer Society(Grant No.RSG-15-192-01)the National Cancer Institute(NCI),NIH(Grant Nos.R01CA208257 and P50CA180995 DRP)the Northwestern University Polsky Urologic Cancer Institute,USA
文摘Numerous studies of relationship between epigenomic features have focused on their strong correlation across the genome,likely because such relationship can be easily identified by many established methods for correlation analysis.However,two features with little correlation may still colocalize at many genomic sites to implement important functions.There is no bioinformatic tool for researchers to specifically identify such feature pairs.Here,we develop a method to identify feature pairs in which two features have maximal colocalization minimal correlation(MACMIC)across the genome.By MACMIC analysis of 3306 feature pairs in 16 human cell types,we reveal a dual role of CCCTC-binding factor(CTCF)in epigenetic regulation of cell identity genes.Although super-enhancers are associated with activation of target genes,only a subset of super-enhancers colocalized with CTCF regulate cell identity genes.At super-enhancers colocalized with CTCF,CTCF is required for the active marker H3 K27 ac in cell types requiring the activation,and also required for the repressive marker H3 K27 me3 in other cell types requiring repression.Our work demonstrates the biological utility of the MACMIC analysis and reveals a key role for CTCF in epigenetic regulation of cell identity.The code for MACMIC is available at https://github.com/bxia888/MACMIC.
基金supported by grants from the National Basic Research Program(973 Program)(Nos.2006CB910401, 2006CB910403 and 2006CB910404) awarded to JY and HSthe Chinese Ministry of Science and Technology,the National Science and Technology Key Project(No. 2008ZX10004-013) awarded to JY
文摘Transposons are sequence elements widely distributed among genomes of all three kingdoms of life, providing genomic changes and playing significant roles in genome evolution. Trichomonas vaginalis is an excellent model system for transposon study since its genome (- 160 Mb) has been sequenced and is composed of - 65% transposons and other repetitive elements. In this study, we primarily report the identification of Kolobok-type transposons (termed tvBac) in T. vaginalis and the results of transposase sequence analysis. We categorized 24 novel subfamilies of the Kolobok element, including one autonomous subfamily and 23 non-autonomous subfamilies. We also identified a novel H2CH motif in tvBac transposases based on multiple sequence alignment. In addition, we supposed that tvBac and Mutator transposons may have evolved independently from a common ancestor according to our phylogenetic analysis. Our results provide basic information for the understanding of the function and evolution of tvBac transposons in particular and other related transposon families in general.
基金supported by faculty start up funding provided by The Methodist Hospital Research Institute,Texas,United States
文摘The breadth of the enrichment site for post-translational trimethylation of histone H3 at lysine 4 (H3K4me3) on chromatin has attracted great attention recently. H3K4me3, an extensively-studied histone modification, is reported to promote gene transcription by directing preinitiation complex assembly through interaction with effector proteins, e.g.,