摘要
Plant genomes contain a large fraction of noncoding sequences.The discovery and annotation of conserved noncoding sequences(CNSs)in plants is an ongoing challenge.Here we report the application of comparative genomics to systematically identify CNSs in 50 well-annotated Gramineae genomes using rice(Oryza sativa)as the reference.We conduct multiple-way whole-genome alignments to the rice genome.The rice genome is annotated as 20 conservation states(CSs)at single-nucleotide resolution using a multivariate hidden Markov model(Cons HMM)based on the multiple-genome alignments.Different states show distinct enrichments for various genomic features,and the conservation scores of CSs are highly correlated with the level of associated chromatin accessibility.We find that at least 33.5%of the rice genome is highly under selection,with more than 70%of the sequence lying outside of coding regions.A catalog of 855,366 regulatory CNSs is generated,and they significantly overlapped with putative active regulatory elements such as promoters,enhancers,and transcription factor binding sites.Collectively,our study provides a resource for elucidating functional noncoding regions of the rice genome and an evolutionary aspect of regulatory sequences in higher plants.
基金
supported by the Nanjing University Deng Feng Scholars Program
the Priority Academic Program Development(PAPD)of Jiangsu Higher Education Institutions
the National Natural Science Foundation of China(32070656)。