1种蛋白质Loop片段结构的概率生成模型

A generative probabilistic model for Loop modeling

导出

摘要在计算生物学中,根据蛋白质的氨基酸序列预测蛋白质的结构是尚未解决的重要问题之一,而其中的1个难点是预测蛋白质中Loop片段的结构。本文用1阶马尔可夫模型为基础,通过对其训练,可根据氨基酸串和2级结构信息为蛋白质Loop片段概率建模和采样。其中用Ramachandran图示法的二面角对描述蛋白质结构,模型的训练和推理通过工具包Mocapy来完成。并使用KL交叉熵和角度差异值作为实验检验标准来完成Loop分布情况的测试实验,同时在从头预测Loop结构实验中预测CASP8中8个自由建模的蛋白质结构。与最流行的方法相比,本文提出的模型因为改进了Loop段的预测精度,从而可使得到的二面角对更加接近真实Loop结构中分布,同时在从头预测中提高整个蛋白质结构的预测精度。并且由于本文的模型具有概率推理特性,故在理论上也更具有无偏见性。 Predicting the three-dimensional structure of a protein given its amino acid sequence remains one of the greatest challenges in computational biology, and the Loop structure prediction is a difficulty to complete this challenge. Based on the first-order Markov model this paper presents a probabilistic model of Loop protein structure. And after the model is trained, can be sampled the dihedral angle pairs represented by real values when given the amino acid sequence and second structure information. The dihedral angle pairs are used by Ramachandran to describe the protein＇s structure. And the model was trained by using the Mocapy DBN toolkit. In order to evaluate model＇s performance, 8 of free modeling targets of CASP8 are chose for the experimentation. And we use KL divergence and angular deviation as the criterion of experimentation. Compared with the state-of-art programs of protein structure prediction, the model enhances the Loop structure prediction accuracy and helps to improve the full protein backbone accuracy. Hence the model is a generative probabilistic model, it is more reasonable in theory.

作者杨鹏吕强杨凌云吴进珍温炜

机构地区苏州大学计算机科学与技术学院江苏省计算机信息处理重点实验室

出处《计算机与应用化学》 CAS CSCD 北大核心 2010年第5期573-576,共4页 Computers and Applied Chemistry

基金国家自然科学基金项目(60970055)

关键词蛋白质Loop 1阶马尔可夫概率生成模型双变量yon Mises分布 protein Loop, first-order Markov model, bivariate von Mises distribution

分类号 TP311.131 [自动化与计算机技术—计算机软件与理论] O6-39 [理学—化学]

引文网络
相关文献

参考文献15

1Rohl C A, Charlie E M Struss, Kira MS Misura and David Barker. Protein structure prediction using rosetta. Methods in Enzymology, 2004, 383:66-93.
2Philip Bradley, Kira MS Misura, David Baker. Toward high-resloution de novo structure prediction for small proteins. Science, 2005, 309(5742): 1868-1871.
3Wouter Boomsma, Mardia KV. Taylor CC, Jesper Ferkinghoff-Borg, Anders Krogh, and Thomas Hamelryck. A generative, probabilistic model of local protein structure. PNAS, 2008, 105: 8932-8937.
4Ramachandran G N, Ramakrishnan C and Sasisekharan V. Stereochemistry of polypeptide chain configurations. J Mol Biol, 1963, 7: 95-99.
5Mardia K V, Taylor C C and Subramaniam G K. Protein bioinformatics and mixtures of bivariate von mises distributions for angular data. Biometrics, 2007, 63:505-512.
6Van Walle I, Lasters I and Sabmark WL. A benchmark for sequence alignment that covers the entire known fold space. Bioinformatics, 2005, 21 : 1267-1268.
7Murzin A G, Brenner S E, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 1995, 247: 536-540.
8BackboneDBN[http://source forge.net/proj ects/phaistos/].
9Hamelryck T. Mocapy: A parallelized toolkit for learning and inference in dynamic bayesian networks. Copenhagen, Univ of Copenhagen, 2007.
10CASP8 [http://predictioncenter.gc.ucdavis.edu/casp8/results.cgi].

1卢露,魏登月.一种基于隐语义模型的协同过滤算法[J].微电子学与计算机,2015,32(2):73-75. 被引量：5
2张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802. 被引量：167
3毕娟,秦志光.基于概率主题模型的社交网络层次化社区发现算法[J].电子科技大学学报,2014,43(6):898-903. 被引量：6
4张明慧,王红玲,周国栋.基于LDA主题特征的自动文摘方法[J].计算机应用与软件,2011,28(10):20-22. 被引量：24
5贾娴.数据结构实验课程的改革与实践研究[J].计算机光盘软件与应用,2015,18(2):237-237. 被引量：1
6孙水明.《数据结构课程设计》刍议[J].科技资讯,2006,4(11):120-121. 被引量：1
7何黎霞.图示法在C语言指针教学方法的应用[J].现代计算机,2010,16(6):83-86.
8陈雨婕.用图示法解析最短路径算法[J].电脑知识与技术（过刊）,2007(24):54-56. 被引量：3
9折楠,徐晓光,陈晓磊,邢亮.基于位图示法的NSFS文件系统设计[J].现代电子技术,2013,36(14):89-92. 被引量：1
10江悦,王润生.基于多特征扩展pLSA模型的场景图像分类[J].信号处理,2010,26(4):539-544. 被引量：10

计算机与应用化学

2010年第5期

浏览历史

内容加载中请稍等...

1种蛋白质Loop片段结构的概率生成模型

参考文献15

相关作者

相关机构

相关主题

浏览历史