摘要
蛋白质折叠速率预测是当今生物物理学最具挑战性的课题之一.近年来,许多科研工作者开展了大量的研究工作来探索折叠速率的决定因素,许多参数和方法被相继提出.但氨基酸残基间的相互作用、氨基酸的序列顺序等信息对折叠速率的影响从未被提及.采用伪氨基酸组成的方法提取氨基酸的序列顺序信息,利用蒙特卡洛方法选择最佳特征因子,建立线性回归模型进行折叠速率预测.该方法能在不需要任何(显示)结构信息的情况下,直接从蛋白质的氨基酸序列出发对折叠速率进行预测.在Jackknife交互检验方法的验证下,对含有99个蛋白质的数据集,发现折叠速率的预测值与实验值有很好的相关性,相关系数能达到0.81,预测误差仅为2.54.这一精度明显优于其他基于序列的方法,充分说明蛋白质的序列顺序信息是影响蛋白质折叠速率的重要因素.
Prediction of protein folding rate is one of the most important challenges in contemporary biophysics.Over the past few years,many researchers have devoted great efforts to reveal the major determinants of protein folding rate,and many parameters and methods have been proposed successively.However,the interaction of amino acids and the sequence order information have never been considered as a property for predicting protein folding rates.It was proposed a novel method,which adopted Chou’s pseudo-amino acid composition to extract the sequence order information,used Monte Carlo method to choose the optimal feature factors,and established the linear regression model to predict the protein folding rate.This novel method can predict protein folding rate from amino acid sequence without any knowledge of the tertiary or secondary structure,or structural class information.Using the Jackknife cross validation test,for the largest dataset yet studied including 99 proteins,it was found that the predicted folding rates correlated well with the experimental values;the correlation coefficient is 0.81,and the standard error is 2.54.The prediction quality is excelled with most existing sequence-based methods.The result implies that the sequence order information plays an important role in protein folding.
出处
《生物化学与生物物理进展》
SCIE
CAS
CSCD
北大核心
2010年第12期1331-1338,共8页
Progress In Biochemistry and Biophysics
基金
国家自然科学基金资助项目(30900318
60571047)~~
关键词
蛋白质折叠
折叠速率预测
伪氨基酸组成
蒙特卡罗方法
protein folding; prediction of folding rate; pseudo-amino acid composition; Monte Carlo method