摘要
RNA碱基相互作用对维持其三维结构的稳定具有重要作用,准确地预测碱基相互作用可以辅助RNA三维结构的预测。然而,用于预测RNA碱基相互作用的数据量少,导致模型未能充分地学习到数据的特征分布,以及数据存在的特性(对称特性和类别不平衡),都影响了模型的性能。针对模型不充分学习和数据特性问题,在深度学习的基础上,提出了一种高性能的RNA碱基相互作用预测方法tpRNA。tpRNA首次在RNA碱基相互作用预测任务中引入迁移学习以改善因数据量少而产生的模型不充分学习问题,并提出高效的损失函数和特征提取模块,充分发挥迁移学习和卷积神经网络在特征学习方面的优势,以缓解数据特性问题。结果表明,引入迁移学习能减小数据量少导致的模型偏差,提出的损失函数能优化模型的训练,特征提取模块能提取到更有效的特征。与最先进的方法相比,tpRNA在低质量输入特征的情形下具有显著的优势。
RNA base interactions play an important role in maintaining the stability of its three-dimensional structure,and accurate prediction of base interactions can help predict the three-dimensional structure of RNA.However,due to the small amount of data,the model could not effectively learn the feature distribution of the training data,and existing data characteristics(symmetry and class imbalance)affect the performance of the RNA base interactions prediction model.Aiming at the problems of insufficient model learning and data characteristics,a high-performance RNA base interactions prediction method called tpRNA is proposed based on deep learning.tpRNA introduces transfer learning in RNA base interactions prediction task to weak the influence of insufficient learning in the training process due to the small amount of data,and an efficient loss function and feature extraction module is proposed to give full play to the advantages of transfer learning and convolutional neural network in feature learning to alleviate the problem of data characteristics.Results show that transfer learning can reduce the model deviation caused by less data,the proposed loss function can optimize the model training,and the feature extraction module can extract more effective features.Compared with the state-of-the-art method,tpRNA also has significant advantages in the case of low-quality input features.
作者
王晓飞
樊学强
李章维
WANG Xiaofei;FAN Xueqiang;LI Zhangwei(College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China)
出处
《计算机科学》
CSCD
北大核心
2023年第3期164-172,共9页
Computer Science
基金
国家自然科学基金(61573317)。
关键词
RNA碱基相互作用
迁移学习
数据特性
损失函数
卷积神经网络
RNA base interactions
Transfer learning
Data characteristic
Loss function
Convolutional neural networks