摘要
RNA结合蛋白在选择性剪贴、RNA编辑及甲基化等多种生物功能中发挥非常重要的作用,从氨基酸序列预测这些蛋白的功能成为基因组功能注释领域的重要挑战之一.传统的预测方法往往从序列中提取氨基酸的理化特性作为初始特征,忽略了motif及motif之间的位置信息,同时由于训练数据规模小、噪声大,导致预测的精度及可信度降低.在此提出了一种从序列预测RNA结合蛋白的深度学习模型.该模型利用2阶段卷积神经网络探测蛋白质序列的功能域,利用长短期记忆网络获得序列的定长特征表示并且能够学习到功能域之间的长短期依赖关系.预测算法中所用到的特征均是通过"学习"自动获得,克服了传统机器学习中特征选择过程过多的人工干预.实验结果表明:模型在处理大规模序列数据时具有明显的优势.
RNA-binding proteins (RNA-BPs) play pivotal roles in alternative splicing,RNA editing, methylating and many other biological functions. Predicting functions of these proteins from primaryamino acids sequences are becoming one of the major challenges in functional annotation of genomes. Traditional prediction methods often devote themselves to extracting physicochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, thesmall scale of data volumes and large noises in training data resutt in lower predictions. In this paper, we propose a new deep learning based model to predict RNA-bindingproteins from primary sequences. The model utilizes two stages of convolutional neutral network (CNN) to detect the function domain of protein sequences, and long short-term memory neural network(LSTM) to obtain the length-fixed feature representation of sequences and learn long shor-term dependencies between function domains of protein sequences. It overcomes more humanintervention in feature selection procedure than in traditional machine learning method, since allfeatures are learned automatically. The experimental results show its priority in processing large scale of sequence data.
作者
李洪顺
于华
宫秀军
Li Hongshun;Yu Hua;Gong Xiujun(School of Computer Science and Technology, Tianjin University, Tianjin 30007;Tianjin Key Laboratory of Cognitive Computing and Application (Tianjin University), Tianjin 300072)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2018年第1期93-101,共9页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61930007)
国家“八六三”高技术研究发展计划基金项目(2015BA3005)
国家“九七三”重点基础研究发展计划基金项目(013CB32930X)
关键词
RNA结合蛋白
卷积神经网络
长短期记忆神经网络
特征学习
深度学习
RNA-binding proteins
convolutional neutral network (CNN)
long short-term memory neural network (LSTM)
feature learning
deep learning