Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing dept...Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis.Here,we developed a single-cell Hi-C simulator(scHi-CSim)that generates high-fidelity data for benchmarking.scHi-CSim merges neighboring cells to overcome the sparseness of data,samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells,and estimates the empirical distribution of restriction fragments to generate simulated data.We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data.Furthermore,scHi-CSim is flexible to change sequencing depth and the number of simulated replicates.We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains.We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.展开更多
Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among thes...Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among these methods, Hi-C plays a key role. Here, we present the Genome Interaction Tools and Resources(GITAR), a software to perform a comprehensive Hi-C data analysis, including data preprocessing, normalization, and visualization, as well as analysis of topologically-associated domains(TADs). GITAR is composed of two main modules:(1)HiCtool, a Python library to process and visualize Hi-C data, including TAD analysis; and(2)processed data library, a large collection of human and mouse datasets processed using HiCtool.HiCtool leads the user step-by-step through a pipeline, which goes from the raw Hi-C data to the computation, visualization, and optimized storage of intra-chromosomal contact matrices and TAD coordinates. A large collection of standardized processed data allows the users to compare different datasets in a consistent way, while saving time to obtain data for visualization or additional analyses. More importantly, GITAR enables users without any programming or bioinformatic expertise to work with Hi-C data. GITAR is publicly available at http://genomegitar.org as an open-source software.展开更多
Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures ...Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long- range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process. Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results. Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.展开更多
Transposable elements (TEs) have no longer been totally considered as "junk DNA" for quite a time since the continual discoveries of their multifunctional roles in eukaryote genomes. As one of the most important a...Transposable elements (TEs) have no longer been totally considered as "junk DNA" for quite a time since the continual discoveries of their multifunctional roles in eukaryote genomes. As one of the most important and abundant TEs that still active in human genome, Alu, a SINE family, has demonstrated its indispensable regulatory functions at sequence level, but its spatial roles are still unclear. Tech- nologies based on 3C (chromosome conformation capture) have revealed the mysterious three-dimensional structure of chromatin, and make it possible to study the distal chromatin interaction in the genome. To find the role TE playing in distal regulation in human genome, we compiled the new released Hi-C data, TE annotation, histone marker annotations, and the genome-wide methylation data to operate correlation analysis, and found that the density of Alu elements showed a strong positive correlation with the level of chromatin interactions (hESC: r= 0.9, P〈 2.2 × 10^16; IMRg0 fibroblasts: r= 0.94, P 〈 2.2 ×10^16) and also have asignificant positive correlation with some remote functional DNA elements like enhancers and promoters (Enhancer: hESC: r= 0.997, P= 2.3× 10^-4; IMR90: r- 0.934, P= 2 × 10^-2; Promoter: hESC: r= 0.995, P= 3.8 × 10^-4; IMR90: r= 0.996, P = 3.2 × 10^-4). Further investigation involving GC content and methylation status showed the GC content of Alu covered sequences shared a similar pattern with that of the overall sequence, suggesting that Alu elements also function as the GC nucleotide and CpG site provider. In all, our results suggest that the Alu elements may act as an alternative parameter to evaluate the Hi-C data, which is confirmed by the correlation analysis of Alu elements and histone markers. Moreover, the GC-rich Alu sequence can bring high GC content and methylation flexibility to the regions with more distal chromatin contact, regulating the transcription of tissue-specific genes.展开更多
基金supported by the National Natural Science Foundation of China(61873198 and 62132015 to L.G.,62002275 to Y.Y.,and 61621003 to S.Z.)the National Key ResearchandDevelopment ProgramoCf hina(2019YFA0709501)+1 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA16021400 and XDPB17 to S.z.)the Key-Area Research and Development of Guangdong Province(2020B1111190001).
文摘Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis.Here,we developed a single-cell Hi-C simulator(scHi-CSim)that generates high-fidelity data for benchmarking.scHi-CSim merges neighboring cells to overcome the sparseness of data,samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells,and estimates the empirical distribution of restriction fragments to generate simulated data.We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data.Furthermore,scHi-CSim is flexible to change sequencing depth and the number of simulated replicates.We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains.We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
基金supported by the National Institutes of Health,United States(Grant Nos.U01CA200147 and DP1HD087990)awarded to SZ
文摘Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among these methods, Hi-C plays a key role. Here, we present the Genome Interaction Tools and Resources(GITAR), a software to perform a comprehensive Hi-C data analysis, including data preprocessing, normalization, and visualization, as well as analysis of topologically-associated domains(TADs). GITAR is composed of two main modules:(1)HiCtool, a Python library to process and visualize Hi-C data, including TAD analysis; and(2)processed data library, a large collection of human and mouse datasets processed using HiCtool.HiCtool leads the user step-by-step through a pipeline, which goes from the raw Hi-C data to the computation, visualization, and optimized storage of intra-chromosomal contact matrices and TAD coordinates. A large collection of standardized processed data allows the users to compare different datasets in a consistent way, while saving time to obtain data for visualization or additional analyses. More importantly, GITAR enables users without any programming or bioinformatic expertise to work with Hi-C data. GITAR is publicly available at http://genomegitar.org as an open-source software.
基金This work is supported by the National Basic Research Program of China (Nos. 2016YFA0100703 and 2015CB964800) and the National Natural Science Foundation of China (No. 31271354).
文摘Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long- range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process. Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results. Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.
基金ACKNOWLEDGEMENTS The authors thank the National Natural Science Foundation of China (Grant No. 91131901), Fudan Graduate Students Innovative Grant (EZH1322383/001/002) and PSCIRT for financial support.
文摘Transposable elements (TEs) have no longer been totally considered as "junk DNA" for quite a time since the continual discoveries of their multifunctional roles in eukaryote genomes. As one of the most important and abundant TEs that still active in human genome, Alu, a SINE family, has demonstrated its indispensable regulatory functions at sequence level, but its spatial roles are still unclear. Tech- nologies based on 3C (chromosome conformation capture) have revealed the mysterious three-dimensional structure of chromatin, and make it possible to study the distal chromatin interaction in the genome. To find the role TE playing in distal regulation in human genome, we compiled the new released Hi-C data, TE annotation, histone marker annotations, and the genome-wide methylation data to operate correlation analysis, and found that the density of Alu elements showed a strong positive correlation with the level of chromatin interactions (hESC: r= 0.9, P〈 2.2 × 10^16; IMRg0 fibroblasts: r= 0.94, P 〈 2.2 ×10^16) and also have asignificant positive correlation with some remote functional DNA elements like enhancers and promoters (Enhancer: hESC: r= 0.997, P= 2.3× 10^-4; IMR90: r- 0.934, P= 2 × 10^-2; Promoter: hESC: r= 0.995, P= 3.8 × 10^-4; IMR90: r= 0.996, P = 3.2 × 10^-4). Further investigation involving GC content and methylation status showed the GC content of Alu covered sequences shared a similar pattern with that of the overall sequence, suggesting that Alu elements also function as the GC nucleotide and CpG site provider. In all, our results suggest that the Alu elements may act as an alternative parameter to evaluate the Hi-C data, which is confirmed by the correlation analysis of Alu elements and histone markers. Moreover, the GC-rich Alu sequence can bring high GC content and methylation flexibility to the regions with more distal chromatin contact, regulating the transcription of tissue-specific genes.