摘要
RNA及RNA结合蛋白之间的相互作用在基因调控中扮演着重要角色。许多预测RNA-蛋白质结合位点的深度学习方法陆续提出。目前多数研究没有将RNA结合蛋白作为模型输入,限制了深度学习模型的规模。对此问题,提出一个带有RNA结合蛋白输入的深度学习模型,通过扩大训练集的规模挖掘RNA-蛋白质结合位点的公共知识。模型将RNA序列先后经过卷积神经网络和门控循环单元来得到序列特征;将序列特征与RNA结合蛋白的独热编码拼接,作为全连接层的输入;通过一个Sigmoid单元输出该RNA结合蛋白对RNA序列的结合概率。在两个权威数据集上,该方法相比其他模型均具有一定优势。
The interactions between RNAs and RNA binding proteins are crucial for gene regulation.A wide range of deep learning methods have been proposed for predicting RNA-protein binding sites.Till now,most of them do not take RNA binding protein as input,which restricts the scale of deep learning model.We propose a deep learning model whose input includes RNA binding protein,which enlarges the scale of training set and mines meta information from RNA-protein binding sites.The model fed the RNA sequence into convolutional neural network and gated recurrent unit to extract sequence feature.It took the concatenation of sequence feature and RNA binding protein in the format of one hot encoding as the input of fully connected layer.A sigmoid unit was used to output the binding probability of RNA binding protein to RNA sequence.This method has advantages over other models on two authoritative data sets.
作者
梅杰
何如吉
吕强
Mei Jie;He Ruji;Lü Qiang(School of Computer Science and Technology,Soochow University,Suzhou 215006,Jiangsu,China;Jiangsu Province Key Laboratory for Information Processing Technologies,Suzhou 215006,Jiangsu,China)
出处
《计算机应用与软件》
北大核心
2022年第3期40-44,共5页
Computer Applications and Software
基金
国家自然科学基金项目(31801108)
苏州大学江苏省计算机信息处理重点实验室开放课题(KJS1843)
江苏高校优势学科建设工程资助项目。