摘要
Pseudouridine(Ψ)is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and m RNAs.However,the functions,mechanisms,and precise distribution ofΨs(especially in m RNAs)still remain largely unclear.The landscape ofΨs across the transcriptome has not yet been fully delineated.Here,we present a highly effective model based on a convolutional neural network(CNN),called Pseudo Uridy Lation Site Estimator(PULSE),to analyze large-scale profiling data ofΨsites and characterize the contextual sequence features of pseudouridylation.PULSE,consisting of two alternatively-stacked convolution and pooling layers followed by a fully-connected neural network,can automatically learn the hidden patterns of pseudouridylation from the local sequence information.Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy,thus enabling us to further characterize the transcriptome-wide landscape ofΨsites.We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation,such as the regulations of RNA secondary structure,codon usage,translation,and RNA stability,and the connection to single nucleotide variants.The source code and final model for PULSE are available at https://github.com/mlcb-thu/PULSE.
基金
supported in part by the National Natural Science Foundation of China(Grant Nos.61472205 and 81630103)
the US National Science Foundation(Grant Nos.DBI-1262107 and IIS-1646333)
the China’s Youth 1000Talent Program
the Beijing Advanced Innovation Center for Structural Biology。