期刊文献+

基于加权自相关函数特征提取法的多类蛋白质同源寡聚体分类研究 被引量:2

Classification of Multi-class Homo-oligomer Based on a Novel Method of Feature Extraction from Protein Primary Structure
下载PDF
导出
摘要 我们提出一种新的特征提取方法,即用蛋白质序列的氨基酸组成成分和一系列的氨基酸残基指数加权自相关函数构成特征向量,表示蛋白质序列,与支持向量机算法组合对蛋白质同源二聚体、同源三聚体、同源四聚体、同源六聚体进行分类研究,得到较好的分类结果。在Jackknife检验下,采用支持向量机算法,基于此新特征提取法所构成的参数集QIANA、QIANB、MEEJ、ROBB和SNEP的总分类精度分别为77.63%、77.16%、76.46%、76.70%、75.06%,分别比传统氨基酸组成成分特征提取法(参数集为COMP)提高6.39、5.92、5.22、5.46、3.82个百分点。对于参数集QIANA,支持向量机的总分类精度为77.63%,比协方差算法提高16.29个百分点。这些结果表明:(1新特征提取法是有效和可行的,基于此特征提取法构成的特征向量包含蛋白质四级结构信息,且可能捕获了埋藏在缔合亚基作用部位接触表面的基本信息;(2)对于蛋白质同源寡聚体分类研究,支持向量机是非常有效的。 A novel method of feature extraction from protein primary structure has been proposed and applied to classify the protein homodimer, homotrimer, homotetramer and homohexamer, i.e. one protein sequence can be represented by a feature vector composed of amino acid compositions and a set of weighted auto-correlation function factors of amino acid residue index. As a result, high classification accuracies are obtained. For example, with the same support vector machine (SVM), the total accuracies of QIANA, AIANB, MEEJ, ROBB and SNEP sets based on this novel feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in Jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82 percent points respectively higher than that of COMP set based on the conventional method composed of amino acid compositions. With the same QIANA set, the total accuracy of SVM is 77.63%, which is 16.29 percent points higher than that of covariant discriminant algorithm. These results show:(1) The novel feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches buried in the interfaces of associated subunits; (2) SVM can be referred as a powerful computational tool for classifying the homo-oligomers of proteins.
出处 《生物医学工程学杂志》 EI CAS CSCD 北大核心 2007年第4期721-726,共6页 Journal of Biomedical Engineering
基金 国家自然科学基金资助项目(60372085) 西北工业大学科技创新基金资助项目(KC02)
关键词 特征提取 加权自相关函数 支持向量机 同源寡聚体 Feature extraction Weighted auto-correlation function Support vector machine (SVM) Homo-oligomers
  • 相关文献

参考文献11

  • 1Chou KC.Review:Structural bioinformatics and its impact to biomedical science.Cur Med Chem,2004, 11:2105
  • 2Chou KC.Molecular therapeutic target for type-2 diabetes.J Proteome Res,2004, 3:1284
  • 3Garian R.Prediction of quaternary structure from primary structure.Bioinformatics,2001, 17:551-556
  • 4Chou KC & Cai YD.Predicting protein quaternary structure by pseudo amino acid composition.Proteins:Struc Func Gene,2003, 53:282
  • 5张绍武,潘泉,陈润生,张洪才.基于支持向量机的蛋白质同源寡聚体分类研究[J].生物化学与生物物理进展,2003,30(6):879-883. 被引量:15
  • 6Bairoch A,Apweiler R.The SWISS-PROT protein data bank and its new supplement TrEMBL.Nucleic Acids Res,1996, 24(1):21
  • 7Qian N,Sejnowski TJ.Predicting the secondary structure of globular proteins using neural network models.J Mal Dial,1988,202 (4):865
  • 8Meek JL,Rossetti ZL.Factors affecting retention and resolution of peptides in HPLC.J Chromatogr,1981,211:15
  • 9Robson B,Osguthorpe DJ.Refined models for computer simulation of protein folding.Applications to the study of conserved secondary structure and flexible hinge points during the folding of pancreatic trypsin inhibitor.J Mol Biol,1979, 132(1):19
  • 10Sneath PH.Relations between chemical structure and biological activity in peptides.J Theor Biol,1966, 12(2):157

二级参考文献21

  • 1Chou K C. A key driving force in determination of protein structural classes. Biochem Biophys Res Commun, 1999, 264(1): 216~224
  • 2Rost B, Sander C. Prediction of secondary structure at better than 70% accuracy. J Mol Biol, 1993, 232(2): 584~599
  • 3Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res, 1998, 26(9): 2230~2236
  • 4Emanuelsson O, Nielsen H, Brunak S, et al. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol, 2000, 300(4): 1005~1016
  • 5Garian R. Prediction of quaternary structure from primary structure. Bioinformatics, 2001, 17(6): 551~556
  • 6Vapnik V. The Nature of Statistical Learning Theory. New York: Springer, 1995. 1~188
  • 7Vapnik V. Statistical Learning Theory. New York: Wiely, 1998. 1~736
  • 8Chou K C, Elrod D W. Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem Biophys Res Commun, 1998, 252(1): 63~68
  • 9Chou K C, Elrod D W. Protein subcellular location prediction. Protein Eng, 1999, 12(2): 107~108
  • 10Brown M, Grundy W, Lin D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA, 2000, 97(1): 262~267

共引文献14

同被引文献26

  • 1施建宇,潘泉,张绍武,程咏梅.基于氨基酸组成分布的蛋白质同源寡聚体分类研究[J].生物物理学报,2006,22(1):49-56. 被引量:9
  • 2Chou KC, Cai YD. Predicting protein quaternary structure by pseudo amino acid composition. Proteins: Structure, Function,Genetics, 2003,53:282-289.
  • 3Zhang SW, Quan P, Zhang HC, Wu YH, Shi JY. Support vector machines for predicting protein homo-oligomers by incorporating pseudo-amino acid composition, lnternet Electronic Journal of Molecular Design, 2003,2(6):392-402.
  • 4Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY. Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and Naive Bayes feature fusion. Amino Acids, 2006,30(4): 461-468.
  • 5Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta, 1975,405:442-451.
  • 6Fasman GD. Handbook of Biochemistry and Molecular Biology. 3rd ed. Proteins-Volumel. Cleveland: CRC Press, 1976.
  • 7Chou PY. Amino acid composition of four classes of proteins. In: "Abstracts of Papers, Part I , Second Chemical Congress of the North American Continent," Las Vegas, 1980.
  • 8Nishikawa K, Ooi T. Correlation of the amino acid composition of a protein to its structural and biological characters. J Bioehem, 1982,91:1821-1824.
  • 9Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, Genetics, 2001,43:246-255.
  • 10Chou KC. Molecular therapeutic target for type2 diabetes. J Proteome Res, 2004,3:1284-1288.

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部