摘要
对基于氨基酸组成、自相关函数和自协方差函数提取特征参量的蛋白质结构类预测算法进行分析比较 ,对氨基酸组成和自相关函数相结合的方法 ,以及氨基酸组成和自协方差函数相结合的方法的预测算法进行了研究。结果表明 :对非同源蛋白质数据库 ,在氨基酸组成和自相关函数相结合的方法中 ,采用Miyazawa和Jernigan的疏水值时 ,训练库的自检验的总精度为95.34% ,其Jackknife检验的总精度为81.92% ,检验库的他检验的总精度为86.61%。在氨基酸组成和自协方差函数相结合的方法中 ,采用Wold等的疏水值时 ,训练库的自检验的总精度为96.71 % ,其Jackknife检验的总精度为82.19 % ,检验库的他检验的总精度为86.88 %。这说明氨基酸组成和自相关函数相结合的方法 ,以及氨基酸组成和自协方差函数相结合的方法可有效提高结构类预测精度 。
For the non-homologous protein database suggested here, the comparison of the predictive methods of the amino-acid composition-based approach, the auto-correlation function-based approach and the auto-covariance function-based approach are presented. The prediction by combining the above three features is investigated. It is found that the predictive accuracy could be remarkably improved by the methods of combining the amino-acid composition with the auto-correlation functions and the amino-acid composition with the auto-covariance functions. In the amino-acid composition with auto-correlation function-added approach, the overall resubstitution accuracy is 95.34%, the overall accuracy of Jackknife test is 81.92% and the overall accuracy of the cross-validation test is 86.61% when Miyazawa and Jernigan's index is used. In the amino-acid composition with auto-covariance function-added approach, the overall resubstitution accuracy is 96.71%, the overall accuracy of Jackknife test is 82.19% and the overall accuracy of the cross-validation test is 86.88% when Wold's index is used. It is shown that how to extract more information from the primary protein sequence is the key to promote the classifying accuracy.
出处
《生物物理学报》
CAS
CSCD
北大核心
2002年第2期213-222,共10页
Acta Biophysica Sinica
关键词
非同源蛋白质
一级序列
预测
结构类
Non-homologous protein
Prediction of structural classes
Amino-acid composition
Auto-correlation function
Auto-covariance function
Bayes discriminant function