期刊文献+

一种只利用序列信息预测RNA结合蛋白的深度学习模型 被引量:9

A Deep Learning Model for Predicting RNA-Binding Proteins Only from Primary Sequences
下载PDF
导出
摘要 RNA结合蛋白在选择性剪贴、RNA编辑及甲基化等多种生物功能中发挥非常重要的作用,从氨基酸序列预测这些蛋白的功能成为基因组功能注释领域的重要挑战之一.传统的预测方法往往从序列中提取氨基酸的理化特性作为初始特征,忽略了motif及motif之间的位置信息,同时由于训练数据规模小、噪声大,导致预测的精度及可信度降低.在此提出了一种从序列预测RNA结合蛋白的深度学习模型.该模型利用2阶段卷积神经网络探测蛋白质序列的功能域,利用长短期记忆网络获得序列的定长特征表示并且能够学习到功能域之间的长短期依赖关系.预测算法中所用到的特征均是通过"学习"自动获得,克服了传统机器学习中特征选择过程过多的人工干预.实验结果表明:模型在处理大规模序列数据时具有明显的优势. RNA-binding proteins (RNA-BPs) play pivotal roles in alternative splicing,RNA editing, methylating and many other biological functions. Predicting functions of these proteins from primaryamino acids sequences are becoming one of the major challenges in functional annotation of genomes. Traditional prediction methods often devote themselves to extracting physicochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, thesmall scale of data volumes and large noises in training data resutt in lower predictions. In this paper, we propose a new deep learning based model to predict RNA-bindingproteins from primary sequences. The model utilizes two stages of convolutional neutral network (CNN) to detect the function domain of protein sequences, and long short-term memory neural network(LSTM) to obtain the length-fixed feature representation of sequences and learn long shor-term dependencies between function domains of protein sequences. It overcomes more humanintervention in feature selection procedure than in traditional machine learning method, since allfeatures are learned automatically. The experimental results show its priority in processing large scale of sequence data.
作者 李洪顺 于华 宫秀军 Li Hongshun;Yu Hua;Gong Xiujun(School of Computer Science and Technology, Tianjin University, Tianjin 30007;Tianjin Key Laboratory of Cognitive Computing and Application (Tianjin University), Tianjin 300072)
机构地区 Tianjin University
出处 《计算机研究与发展》 EI CSCD 北大核心 2018年第1期93-101,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61930007) 国家“八六三”高技术研究发展计划基金项目(2015BA3005) 国家“九七三”重点基础研究发展计划基金项目(013CB32930X)
关键词 RNA结合蛋白 卷积神经网络 长短期记忆神经网络 特征学习 深度学习 RNA-binding proteins convolutional neutral network (CNN) long short-term memory neural network (LSTM) feature learning deep learning
  • 相关文献

参考文献2

二级参考文献40

  • 1MarkoffJ. How many computers to identify a cat?[NJ The New York Times, 2012-06-25.
  • 2MarkoffJ. Scientists see promise in deep-learning programs[NJ. The New York Times, 2012-11-23.
  • 3李彦宏.2012百度年会主题报告:相信技术的力量[R].北京:百度,2013.
  • 410 Breakthrough Technologies 2013[N]. MIT Technology Review, 2013-04-23.
  • 5Rumelhart D, Hinton G, Williams R. Learning representations by back-propagating errors[J]. Nature. 1986, 323(6088): 533-536.
  • 6Hinton G, Salakhutdinov R. Reducing the dimensionality of data with neural networks[J]. Science. 2006, 313(504). Doi: 10. 1l26/science. 1127647.
  • 7Dahl G. Yu Dong, Deng u, et a1. Context-dependent pre?trained deep neural networks for large vocabulary speech recognition[J]. IEEE Trans on Audio, Speech, and Language Processing. 2012, 20 (1): 30-42.
  • 8Jaitly N. Nguyen P, Nguyen A, et a1. Application of pretrained deep neural networks to large vocabulary speech recognition[CJ //Proc of Interspeech , Grenoble, France: International Speech Communication Association, 2012.
  • 9LeCun y, Boser B, DenkerJ S. et a1. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, I: 541-551.
  • 10Large Scale Visual Recognition Challenge 2012 (ILSVRC2012)[OLJ.[2013-08-01J. http://www. image?net.org/challenges/LSVRC/2012/.

共引文献673

同被引文献60

引证文献9

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部