摘要
蛋白质折叠模式识别是一种分析蛋白质结构的重要方法。以序列相似性较低的蛋白质为训练集,提取蛋白质序列信息频数及疏水性等信息作为折叠类型特征,从SCOP数据库中已分类蛋白质构建1 393种折叠模式的数据集,采用SVM预测蛋白质1 393种折叠模式。封闭测试准确率达99.612 2%,基于SCOP的开放测试准确率达79.632 9%。基于另一个权威测试集的开放测试折叠准确率达64.705 9%,SCOP类准确率达76.470 6%,可以有效地对蛋白质折叠模式进行预测,从而为蛋白质从头预测提供参考。
One of the important approaches to structure analysis is protein fold recognition.With a set of low similarity protein sequence for training,this paper extracts protein sequence and hydrophobic-polar information as folding type features.Based on the classified proteins of SCOP database,a SVM classifier for fold recognition is trained to predict 1393 protein folds.With the close test,the classifier achieves the accuracy as high as 99.612 2%.With the open test based on SCOP data,the classifier achieves the accuracy of 79.632 9%.Another open test constructed by one authority benchmark,the classifier achieves fold accuracy of 64.705 9%,and SCOP class accuracy of 76.470 6%,so we can predict protein fold effectively and provide reference topological characteristics for de novo predict.
出处
《生物信息学》
2010年第4期287-290,共4页
Chinese Journal of Bioinformatics
基金
国家自然科学基金(60970055)资助项目