摘要
利用M arkov链模型对蛋白质可溶性特性进行统计建模,按照蛋白质序列中残基的相对可溶性,将其分为两类(表面/内部)和三类(表面/中间/内部)进行预测。选择不同M CM阶数和分类阈值对数据进行训练和预测,以确保得到最好的分类效果。对两种数据集在不同分类阈值下进行分类预测,并将结果同其他已有方法如神经网络、信息论和支持向量机法等进行比较。该方法对蛋白质可溶性的预测精度和相关系数普遍好于或接近其他预测方法,其中对两类分类问题和三类分类问题的最优分类结果分别达到78.9%和67.7%。同时,该方法具有运算复杂度低、耗时短等优点。
Residues in protein sequences can be classified into two (exposed / buried) or three (exposed / intermediate / buried) states according to their relative solvent accessibility. Markov chain model (MCM) had been adopted for statistical modeling and prediction. Different orders of MCM and classification thresholds were explored to find the best parameters. Prediction results for two different data sets and different cut-off thresholds were evaluated and compared with some existing methods, such as neural network, information theory and support vector machine. The best prediction accuracies achieved by the MCM method were 78. 9% for the two-state prediction problem and 67.70% for the three-state prediction problem, respectively. A comprehensive comparison for all these results shows that the prediction accuracy and the correlative coefficient of the MCM method are better than or comparable to those obtained by the other prediction methods. At the same time, the advantage of this method is the lower computation complexity and better time-consuming performance.
出处
《生物医学工程学杂志》
EI
CAS
CSCD
北大核心
2006年第5期1109-1113,共5页
Journal of Biomedical Engineering
基金
中国科学技术大学知识创新工程重大项目