摘要
在计算生物学中,根据蛋白质的氨基酸序列预测蛋白质的结构是尚未解决的重要问题之一,而其中的1个难点是预测蛋白质中Loop片段的结构。本文用1阶马尔可夫模型为基础,通过对其训练,可根据氨基酸串和2级结构信息为蛋白质Loop片段概率建模和采样。其中用Ramachandran图示法的二面角对描述蛋白质结构,模型的训练和推理通过工具包Mocapy来完成。并使用KL交叉熵和角度差异值作为实验检验标准来完成Loop分布情况的测试实验,同时在从头预测Loop结构实验中预测CASP8中8个自由建模的蛋白质结构。与最流行的方法相比,本文提出的模型因为改进了Loop段的预测精度,从而可使得到的二面角对更加接近真实Loop结构中分布,同时在从头预测中提高整个蛋白质结构的预测精度。并且由于本文的模型具有概率推理特性,故在理论上也更具有无偏见性。
Predicting the three-dimensional structure of a protein given its amino acid sequence remains one of the greatest challenges in computational biology, and the Loop structure prediction is a difficulty to complete this challenge. Based on the first-order Markov model this paper presents a probabilistic model of Loop protein structure. And after the model is trained, can be sampled the dihedral angle pairs represented by real values when given the amino acid sequence and second structure information. The dihedral angle pairs are used by Ramachandran to describe the protein's structure. And the model was trained by using the Mocapy DBN toolkit. In order to evaluate model's performance, 8 of free modeling targets of CASP8 are chose for the experimentation. And we use KL divergence and angular deviation as the criterion of experimentation. Compared with the state-of-art programs of protein structure prediction, the model enhances the Loop structure prediction accuracy and helps to improve the full protein backbone accuracy. Hence the model is a generative probabilistic model, it is more reasonable in theory.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2010年第5期573-576,共4页
Computers and Applied Chemistry
基金
国家自然科学基金项目(60970055)