期刊文献+

蛋白质折叠类型的分类建模与识别 被引量:8

Classification Modeling and Recognition of Protein Fold Type
下载PDF
导出
摘要 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集,36个折叠类型的平均识别敏感性为90%,特异性为99%,马修斯相关系数(MCC)为0.95.结果表明:对于成员较多,无法建立统一模型的折叠类型,基于RMSD的系统分类建模均可实现较高准确率的识别,为蛋白质折叠识别拓展了新的方法和思路,为进一步研究奠定了基础. The mechanism of how protein amino acid sequences determine protein structure is a core issue in biology.The protein fold type reflects the topological pattern of the structure′s core.Fold recognition is an important method in protein sequence-structure research.This article focuses on the 36 fold types that are not incorporated into the unified hidden Markov model(HMM) model but that account for 41.8% of α,β,and α/β protein′s in the Astral 1.65 sequence database.The training set contains samples that have less than 25% sequence identity with each other.We applied the hierarchical clustering method according to root mean square deviation(RMSD) and fold subgroups were generated.A profile-HMM based on a multiple structural alignment algorithm(MUSTANG) structure alignment was then built for each subgroup.After testing 9505 proteins with less than 95% sequence identity from the Astral 1.65 database,the average sensitivity,specificity and Matthew′s correlation coefficient(MCC) of the 36 fold types were found to be 90%,99% and 0.95,respectively.These results show that classification modeling according to RMSD is able to achieve precise fold recognition while a unified HMM cannot be built because there are too many elements in the training set.We have developed a new method and novel ideas to enable profile-HMM protein fold recognition and have laid the foundation for further research.
出处 《物理化学学报》 SCIE CAS CSCD 北大核心 2009年第12期2558-2564,共7页 Acta Physico-Chimica Sinica
基金 国家自然科学基金(30570427) 北京市自然科学基金(4092008)资助项目~~
关键词 蛋白质折叠类型 均方根偏差 系统聚类 隐马尔科夫模型 折叠识别 Protein fold type RMSD Hierarchical clustering Profile-HMM Fold recognition
  • 相关文献

参考文献3

二级参考文献64

  • 1施建宇,潘泉,张绍武,梁彦.基于支持向量机融合网络的蛋白质折叠子识别研究[J].生物化学与生物物理进展,2006,33(2):155-162. 被引量:19
  • 2李菁,王炜.氨基酸残基归类及用简化后的字符识别蛋白质结构保守区域[J].中国科学(C辑),2006,36(6):552-562. 被引量:1
  • 3Schwartz R M, Dayhoff M O. Atlas of protein sequence and structure [J]. Nat Biomed Res Found, 1978, 5:353-358.
  • 4Henikoff S, Henikoff J G. Amino acid substitution matrices from protein blocks [J]. Proc Natl Acad Sci USA, 1992, 89(22) :10915- 10919.
  • 5Shi J, Blundell T L, Mizuguchi K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties [ J ]. J Mol Biol, 2001, 310( 1 ) : 243-257.
  • 6Rice D W, Eisenberg D. A 3D-ID substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence [J]. J Mol Biol, 1997, 267(4):1026-1038.
  • 7Topham C M, Srinivasan N, Blundell T L. Prediction of the stability of protein mutants based on structural environment dependent amino acid substitution and propensity tables [J]. Protein Eng, 1997, 10 (1):7-21.
  • 8Topham C M, McLeod A, Eisenmenger F, et al. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables [J]. J Mol Biol, 1993, 229(1) : 194-220.
  • 9Liu X, Zheng W M. An amino acid substitution matrix for protein conformation identification [ J ]. J Bioinform Comput Biol, 2006, 4 (3) : 769-782.
  • 10Ktunar S, Bansal M. Dissecting alpha-helices: position-specific analysis of alpha-helices in globular proteins [J]. Proteins, 1998, 31 (4) :460-476.

共引文献7

同被引文献184

引证文献8

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部