N6-methyladenosine(m6A)is an important RNA methylation modification involved in regulating diverse biological processes across multiple species.Hence,the identification of m6A modification sites provides valuable insi...N6-methyladenosine(m6A)is an important RNA methylation modification involved in regulating diverse biological processes across multiple species.Hence,the identification of m6A modification sites provides valuable insight into the biological mechanisms of complex diseases at the post-transcriptional level.Although a variety of identification algorithms have been proposed recently,most of them capture the features of m6A modification sites by focusing on the sequential dependencies of nucleotides at different positions in RNA sequences,while ignoring the structural dependencies of nucleotides in their threedimensional structures.To overcome this issue,we propose a cross-species end-to-end deep learning model,namely CR-NSSD,which conduct a cross-domain representation learning process integrating nucleotide structural and sequential dependencies for RNA m6A site identification.Specifically,CR-NSSD first obtains the pre-coded representations of RNA sequences by incorporating the position information into single-nucleotide states with chaos game representation theory.It then constructs a crossdomain reconstruction encoder to learn the sequential and structural dependencies between nucleotides.By minimizing the reconstruction and binary cross-entropy losses,CR-NSSD is trained to complete the task of m6A site identification.Extensive experiments have demonstrated the promising performance of CR-NSSD by comparing it with several state-of-the-art m6A identification algorithms.Moreover,the results of cross-species prediction indicate that the integration of sequential and structural dependencies allows CR-NSSD to capture general features of m6A modification sites among different species,thus improving the accuracy of cross-species identification.展开更多
基金supported in part by the National Natural Science Foundation of China(62373348)the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2021D01D05)+1 种基金the Tianshan Talent Training Program(2023TSYCLJ0021)the Pioneer Hundred Talents Program of Chinese Academy of Sciences.
文摘N6-methyladenosine(m6A)is an important RNA methylation modification involved in regulating diverse biological processes across multiple species.Hence,the identification of m6A modification sites provides valuable insight into the biological mechanisms of complex diseases at the post-transcriptional level.Although a variety of identification algorithms have been proposed recently,most of them capture the features of m6A modification sites by focusing on the sequential dependencies of nucleotides at different positions in RNA sequences,while ignoring the structural dependencies of nucleotides in their threedimensional structures.To overcome this issue,we propose a cross-species end-to-end deep learning model,namely CR-NSSD,which conduct a cross-domain representation learning process integrating nucleotide structural and sequential dependencies for RNA m6A site identification.Specifically,CR-NSSD first obtains the pre-coded representations of RNA sequences by incorporating the position information into single-nucleotide states with chaos game representation theory.It then constructs a crossdomain reconstruction encoder to learn the sequential and structural dependencies between nucleotides.By minimizing the reconstruction and binary cross-entropy losses,CR-NSSD is trained to complete the task of m6A site identification.Extensive experiments have demonstrated the promising performance of CR-NSSD by comparing it with several state-of-the-art m6A identification algorithms.Moreover,the results of cross-species prediction indicate that the integration of sequential and structural dependencies allows CR-NSSD to capture general features of m6A modification sites among different species,thus improving the accuracy of cross-species identification.