摘要
从蛋白质的氨基酸组成出发,用信息聚类方法给出了蛋白质的聚类树状图,发现树状图的分支与蛋白质二级结构的含量有较强的相关性.
The amino acid composition of each protein is viewed as a vector.Define the informaiton gain as the distance between each pair of vectors.By use of information clustering method a tree like diagram of the classification of 102 proteins is deduced in which 7 main branches (A to G) are obtained.A good relation between the clustering of amino acid composition and the classification of secondary structure contents are found.For example,the β type and α type of Nakashima′s classification of folding types correspond to branches D,G and E,B respectively,branch A corresponds to a part of α+β and α/β type,branches C and F-each contains two sub branches corresponding to type α+β and α/β and type α respectively,The ζ type corresponds to the scattered small branches in our clustering.In calculation,to improve statistics,we classify 20 amino acids into 15 categories (namely,I,L,V;F,W;S,T;and K,R merged into one respectively) according to the correlation between amino acid and secondary structure.On the otherhand,if the hydrophobic hydrophilic correlation between adjacent residues is taken into account in the clustering,then the result will be further improved.
出处
《内蒙古大学学报(自然科学版)》
CAS
CSCD
1997年第1期41-47,共7页
Journal of Inner Mongolia University:Natural Science Edition
基金
国家自然科学基金
关键词
氨基酸
二级结构
信息聚类
蛋白质
amino acid composition secondary structure content information clustering