期刊文献+

基于词嵌入的机器学习方法预测RNA柔性

Word Embedding Based Machine Learning Method for RNA Flexibility Prediction
下载PDF
导出
摘要 RNA分子的动力学与其功能密切相关。RNA分子的柔性,作为其动力学最基本的特性之一,已被广泛用于研究其折叠性质、结构稳定性和配体结合能力等诸多方面。实验测定RNA柔性的方法往往比较耗时费力,因此急需发展一种快速、准确的理论方法来预测RNA的柔性。为此,本文提出了一种机器学习方法RNAfwe来预测RNA柔性,该方法采用词嵌入技术提取RNA序列特征。RNAfwe与同类基于序列的RNAflex方法比较,结果显示:相比于使用独热编码的RNAflex (One-Hot),RNAfwe在训练和测试集上都获得了更高的皮尔逊相关系数(PCC) 0.5017和0.4704,这表明词嵌入相较于独热编码可从RNA序列中提取与柔性更相关的特征;相比于利用进化信息的RNAflex (PSSM),尽管RNAfwe的性能稍差,但前者需要知道足够的同源序列。这项工作有助于RNA动力学性质的研究,另外为词嵌入技术广泛用于生物信息学研究提供了支持。RNA molecular dynamics is closely related to their functions. The flexibility of RNA molecules, as one of the most fundamental characteristics of their dynamics, has been widely used to study their folding properties, structural stability, ligand binding ability and so on. Experimental methods for measuring RNA flexibility are often time-consuming and labor intensive, so there is an urgent need to develop a fast and accurate theoretical method to predict RNA flexibility. To this end, we propose a machine learning method, RNAfwe, to predict RNA flexibility, which uses the word embedding technique to extract RNA sequence features. The comparison of RNAfwe with the similar sequence-based RNAflex method shows that compared with RNAflex (One-Hot), RNAfwe obtains higher Pearson correlation coefficients (PCC) of 0.5017 and 0.4704 on both training and test sets, indicating that the word embedding could extract the more related features to flexibility from RNA sequences than the one-hot encoding. Compared with RNAflex (PSSM) which uses evolutionary information, although RNAfwe has a slightly inferior performance, the former requires the knowledge of sufficient homologous sequences. This work contributes to the study of RNA dynamic properties, and provides the support for word embedding technique to be widely used in bioinformatics research.
出处 《生物物理学》 2024年第2期23-30,共8页 Biophysics
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部