摘要
我们提出一种新的特征提取方法,即用蛋白质序列的氨基酸组成成分和一系列的氨基酸残基指数加权自相关函数构成特征向量,表示蛋白质序列,与支持向量机算法组合对蛋白质同源二聚体、同源三聚体、同源四聚体、同源六聚体进行分类研究,得到较好的分类结果。在Jackknife检验下,采用支持向量机算法,基于此新特征提取法所构成的参数集QIANA、QIANB、MEEJ、ROBB和SNEP的总分类精度分别为77.63%、77.16%、76.46%、76.70%、75.06%,分别比传统氨基酸组成成分特征提取法(参数集为COMP)提高6.39、5.92、5.22、5.46、3.82个百分点。对于参数集QIANA,支持向量机的总分类精度为77.63%,比协方差算法提高16.29个百分点。这些结果表明:(1新特征提取法是有效和可行的,基于此特征提取法构成的特征向量包含蛋白质四级结构信息,且可能捕获了埋藏在缔合亚基作用部位接触表面的基本信息;(2)对于蛋白质同源寡聚体分类研究,支持向量机是非常有效的。
A novel method of feature extraction from protein primary structure has been proposed and applied to classify the protein homodimer, homotrimer, homotetramer and homohexamer, i.e. one protein sequence can be represented by a feature vector composed of amino acid compositions and a set of weighted auto-correlation function factors of amino acid residue index. As a result, high classification accuracies are obtained. For example, with the same support vector machine (SVM), the total accuracies of QIANA, AIANB, MEEJ, ROBB and SNEP sets based on this novel feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in Jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82 percent points respectively higher than that of COMP set based on the conventional method composed of amino acid compositions. With the same QIANA set, the total accuracy of SVM is 77.63%, which is 16.29 percent points higher than that of covariant discriminant algorithm. These results show:(1) The novel feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches buried in the interfaces of associated subunits; (2) SVM can be referred as a powerful computational tool for classifying the homo-oligomers of proteins.
出处
《生物医学工程学杂志》
EI
CAS
CSCD
北大核心
2007年第4期721-726,共6页
Journal of Biomedical Engineering
基金
国家自然科学基金资助项目(60372085)
西北工业大学科技创新基金资助项目(KC02)
关键词
特征提取
加权自相关函数
支持向量机
同源寡聚体
Feature extraction Weighted auto-correlation function Support vector machine (SVM) Homo-oligomers