Chromatin accessibility is a highly informative structural feature for understanding gene transcription regulation,because it indicates the degree to which nuclear macromolecules such as proteins and RNAs can access c...Chromatin accessibility is a highly informative structural feature for understanding gene transcription regulation,because it indicates the degree to which nuclear macromolecules such as proteins and RNAs can access chromosomal DNA.Studies have shown that chromatin accessibility is highly dynamic during stress response,stimulus response,and developmental transition.Moreover,physical access to chromosomal DNA in eukaryotes is highly cell-specific.Therefore,current technologies such as DNase-seq,ATAC-seq,and FAIRE-seq reveal only a portion of the open chromatin regions(OCRs)present in a given species.Thus,the genome-wide distribution of OCRs remains unknown.In this study,we developed a bioinformatics tool called Char Plant for the de novo prediction of OCRs in plant genomes.To develop this tool,we constructed a three-layer convolutional neural network(CNN)and subsequently trained the CNN using DNase-seq and ATACseq datasets of four plant species.The model simultaneously learns the sequence motifs and regulatory logics,which are jointly used to determine DNA accessibility.All of these steps are integrated into Char Plant,which can be run using a simple command line.The results of data analysis using Char Plant in this study demonstrate its prediction power and computational efficiency.To our knowledge,Char Plant is the first de novo prediction tool that can identify potential OCRs in the whole genome.The source code of Char Plant and supporting files are freely available from https://github.com/Yin-Shen/Char Plant.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.31871269)the Hubei Provincial Natural Science Foundation of China(Grant No.2019CFA014)the Fundamental Research Funds for the Central Universities,China(Grant No.2662019PY069)。
文摘Chromatin accessibility is a highly informative structural feature for understanding gene transcription regulation,because it indicates the degree to which nuclear macromolecules such as proteins and RNAs can access chromosomal DNA.Studies have shown that chromatin accessibility is highly dynamic during stress response,stimulus response,and developmental transition.Moreover,physical access to chromosomal DNA in eukaryotes is highly cell-specific.Therefore,current technologies such as DNase-seq,ATAC-seq,and FAIRE-seq reveal only a portion of the open chromatin regions(OCRs)present in a given species.Thus,the genome-wide distribution of OCRs remains unknown.In this study,we developed a bioinformatics tool called Char Plant for the de novo prediction of OCRs in plant genomes.To develop this tool,we constructed a three-layer convolutional neural network(CNN)and subsequently trained the CNN using DNase-seq and ATACseq datasets of four plant species.The model simultaneously learns the sequence motifs and regulatory logics,which are jointly used to determine DNA accessibility.All of these steps are integrated into Char Plant,which can be run using a simple command line.The results of data analysis using Char Plant in this study demonstrate its prediction power and computational efficiency.To our knowledge,Char Plant is the first de novo prediction tool that can identify potential OCRs in the whole genome.The source code of Char Plant and supporting files are freely available from https://github.com/Yin-Shen/Char Plant.