摘要
用化学计量学的主成分分析(PCA)法计算和分析了4种类型(α型、β型、α/β型和α+β型)204个蛋白质的20种氨基酸在主成分中的贡献.研究发现,20种氨基酸在4种类型蛋白质的主成分中的贡献有明显的不同.氨基酸在主成分中的贡献体现了4种类型蛋白质的结构特征,有深刻的物理和化学的内在原因.我们把氨基酸的主成分分析法应用于蛋白质结构类型的预测,对4种类型的蛋白质都取得了满意的结果.使用LOO(leaveoneout)检验法,4种类型蛋白质的预测正确率分别为:76.9%(α型)、96.7%(β型)、82.2%(α/β型)和78.3%(α+β型),204个蛋白质的整体正确率为84.3%,高于以氨基酸组成为基础的简单距离和欧几里德距离等方法.
In this research we introduce the amino acid PCA (principal component analysis) method in protein structure study. Protein structural classes are fuzzy sets and the data of amino acid sequences may contain uncertain factors and errors from experiments. Amino acid PCA method abstracts principal factors from data base and minimizes the errors in PDB sequence data. Amino acid PCA method gives better structure prediction results than the methods based on amino acid compositions. This method is applied in the study of 204 proteins in 4 classes of protein structures (α, β, α/β and α+β). The prediction accuracy are (76.9%) for α-protein, 96.7% for β-protein, 82.2% for α/β-protein and 78.3% for α+β-protein, and the overall prediction accuracy for 204 proteins is 84.3%, higher than the results of simple distance method and Euclidean distance method.
出处
《天津师范大学学报(自然科学版)》
CAS
2005年第1期1-5,共5页
Journal of Tianjin Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(20373048)
天津市科委基础科学面上资助项目(023618211)