期刊文献+

一个识别蛋白质折叠模式的SVM分类器 被引量:1

A SVM Classifier for Protein Fold Recognition
下载PDF
导出
摘要 蛋白质折叠模式识别是一种分析蛋白质结构的重要方法。以序列相似性较低的蛋白质为训练集,提取蛋白质序列信息频数及疏水性等信息作为折叠类型特征,从SCOP数据库中已分类蛋白质构建1 393种折叠模式的数据集,采用SVM预测蛋白质1 393种折叠模式。封闭测试准确率达99.612 2%,基于SCOP的开放测试准确率达79.632 9%。基于另一个权威测试集的开放测试折叠准确率达64.705 9%,SCOP类准确率达76.470 6%,可以有效地对蛋白质折叠模式进行预测,从而为蛋白质从头预测提供参考。 One of the important approaches to structure analysis is protein fold recognition.With a set of low similarity protein sequence for training,this paper extracts protein sequence and hydrophobic-polar information as folding type features.Based on the classified proteins of SCOP database,a SVM classifier for fold recognition is trained to predict 1393 protein folds.With the close test,the classifier achieves the accuracy as high as 99.612 2%.With the open test based on SCOP data,the classifier achieves the accuracy of 79.632 9%.Another open test constructed by one authority benchmark,the classifier achieves fold accuracy of 64.705 9%,and SCOP class accuracy of 76.470 6%,so we can predict protein fold effectively and provide reference topological characteristics for de novo predict.
出处 《生物信息学》 2010年第4期287-290,共4页 Chinese Journal of Bioinformatics
基金 国家自然科学基金(60970055)资助项目
关键词 蛋白质折叠模式识别 SVM 从头预测 Protein Fold Recognition SVM De Novo Predict
  • 相关文献

参考文献13

  • 1阎龙飞,孙之荣.蛋白质分子结构[M].北京:清华大学出版社,1999.
  • 2C.Chothia.One thousand families for the molecular biologist[J].Nature,1992,357(6379):543-544.
  • 3D.Baker.A surprising simplicity to protein folding[J].Nature,2000,405(6782):39-42.
  • 4A.G.Murzin,S.E.Brenner,T.Hubbard,and C.Chothia.a structural classification of proteins database for the investigation of sequences and structures[J].J.Mol.Biol,1995,247:536-540.
  • 5C.A.Orengo,A.D.Michie,and S.Jones.A hierarchic classification of protein domain structures[J].J M Structure,1997,5(8):1093-1108.
  • 6C.H.Q.Ding and I.Dubchak.Multi-class protein fold recognition using support vector machines and neural networks[J].Bioinformatics,2001,17(4):349-358.
  • 7施建宇,潘泉,张绍武,梁彦.基于支持向量机融合网络的蛋白质折叠子识别研究[J].生物化学与生物物理进展,2006,33(2):155-162. 被引量:19
  • 8Helen M.Berman,John Westbrook,et al.The Protein Data Bank[J].Nucleic Acids Res.2000.28(1):235-242.
  • 9孙向东.蛋白质结构预测[M].北京:科学出版社,2008:1-4.
  • 10C.B.Anfinsen.Principles that govern the folding of protein chains[J].Science,1973,181(96):223-230.

二级参考文献21

  • 1Kneller D G,Cohen F E,Langridge R.Improvements in protein secondary-structure prediction by enhanced neural networks.J Mol Biol,1990,214 (1):171~182.
  • 2Zhang C T,Chou K C.An optimization approach to predicting protein structural class from amino acid composition.Protein Sci,1992,1 (3):401~408.
  • 3Dubchak I,Muchnik I,Mayor C,et al.Recognition of a protein fold in the context of the SCOP classification.Proteins,1999,35(4):401 ~407.
  • 4Ding C H Q,Dubchak I.Multi-class protein fold recognition using support vector machines and neural networks.Bioinformatics,2001,17 (4):349~358.
  • 5Chinnasamy A,Sung W K,Mittal A.Protein structure and fold prediction using tree-augmented naive bayesian classifier.J Bioinform Comput Biol,2005,3 (4):803~820..
  • 6Nakashima H,Nishikawa K,Ooi T.The folding type of a protein is relevant to the amino acid composition.J Biochem,1986,99(1):153~162.
  • 7Vapnik V.The Nature of Statistical Learning Theory.New York:Spinger-Verlag,1995.1~188.
  • 8Jaakkola T,Diekhans M,Haussler D.Using the fisher kernel method to detect remote protein homologies.In:Lengauer T,eds.Proceedings of The Seventh International Conference on Intelligent Systems for Molecular Biology.Menlo Park:AAAI Press,1999.149~158.
  • 9Zien A,Ratsch G,Mika S,et al.Engineering support vector machine kernels that recognize translation initiation sites.Bioinformatics,2000,16 (9):799~807.
  • 10Hua S J,Sun Z R.Support vector machine approach for protein subcellular localization prediction.Bioinformatics,2001,17 (8):721~728.

共引文献21

同被引文献11

  • 1Marks D S,Colwell L J, Sheridan R, et al. Protein 3D structure computed from evolutionary sequence variation[J]. PloS one, 2011,6 (12) : e28766.
  • 2Hopf T A, Colwell L J, Sheridan R, et al. Three-dimensional structures of membrane proteins from genomic sequencing[J].Cell, 2012,149 (7) : 1607-1621.
  • 3Wang Zhi-yong,Xu Jin-bo. Predicting protein contact map using evolutionary and physical constraints by integer programming [J]. Bioinformatics, 2013,29 (13) :1266-1273.
  • 4Ma Jian-zhu,Wang Sheng, Xu Jin-bo. Joint Multi-family Evolu- tionary Coupling Analysis for Protein Contact Prediction[J]. arXiv preprint arXiv: 1312. 2988,2013.
  • 5Wu Si-tao, Zhang Yang. MUSTER: improving protein sequence profile-profile alignments by using multiple sources of structure information[J]. Proteins: Structure, Function, and Bioinformat- ics, 2008,72(2) : 547-556.
  • 6Lazaridis T, Karplus M. Effective energy functions for protein structure prediction[J]. Current opinion in structural biology, 2000,10(2) : 139-145.
  • 7Jones D T, Buchan D W A, Cozzetto D, et al. PSICOV: preelse structural contact prediction using sparse inverse covariance es- timation on large multiple sequence alignments[J]. Bioinforma- tics,2012,28(2) : 184-190.
  • 8Kryshtafovych A, Monastyrskyy B, Fidelis K. CASP Prediction Center infrastructure and evaluation measures in CASP10 and CASP ROLL[J]. Proteins: Structure, Function, and Bioinforma- tics,2014,82(S2) :7-13.
  • 9Ezkurdia I,Grana O,Izarzugaza J M G,et al. Assessment of do- main boundary predictions and the prediction of intramolecular contacts in CASP8[J]. Proteins Structure, Function, and Bioin- formatics, 2009,77 ($9) ,196-209.
  • 10Altschul S F, Koonin E V. Iterated profile searches with PSI- BLAST-a tool for discovery in protein databases[J]. Trends in biochemical sciences, 1998,23 ( 11 ) : 444-447.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部