摘要
Background:Histone modifications are major factors that define chromatin states and have functions in regulating gene expression in eukaryotic cells.Chromatin immunoprecipitation coupled with high-throughput sequencing(ChIP-seq)technique has been widely used for profiling the genome-wide distribution of chromatin-associating protein factors.Some histone modifications,such as H3K27me3 and H3K9me3,usually mark broad domains in the genome ranging from kilobases(kb)to megabases(Mb)long,resulting in diffuse patterns in the ChIP-seq data that are challenging for signal separation.While most existing ChIP-seq peak-calling algorithms are based on local statistical models without account of multi-scale features,a principled method to identify scale-free board domains has been lacking.Methods:Here we present RECOGNICER(Recursive coarse-graining identification for ChIP-seq enriched regions),a computational method for identifying ChIP-seq enriched domains on a large range of scales.The algorithm is based on a coarse-graining approach,which uses recursive block transformations to determine spatial clustering of local enriched elements across multiple length scales.Results:We apply RECOGNICER to call H3K27me3 domains from ChIP-seq data,and validate the results based on H3K27me3's association with repressive gene expression.We show that RECOGNICER outperforms existing ChIP-seq broad domain calling tools in identifying more whole domains than separated pieces.Conclusion:RECOGNICER can be a useful bioinformatics tool for next-generation sequencing data analysis in epigenomics research.
基金
the U.S.National Institutes of Health(NIH)R35GM133712 to C.Z.
R01 AI121080 and R01AI139874 to W.P.