摘要
蛋白质亚细胞定位信息对于确定蛋白质功能非常重要,它可以提供蛋白质在什么细胞环境下相互作用或与其它分子作用的信息,另外,如果知道蛋白质在细胞中的定位将有助于在细胞水平上理解复杂的蛋白质调控路径。面对后基因时代产生的海量蛋白质序列数据,迫切需要-些自动、快速、准确地确定蛋白质亚细胞定位的方法。为此,通过整合蛋白质进化保守信息,文章提出一种新的方法预测亚细胞定位。该方法基于Chou的伪氨基酸组成成分概念,应用改进的进化保守性算法计算蛋白质序列中每一个残基的保守值,从而使每一蛋白质序列可用基于小波多尺度能量而构建的特征向量来表示。另外,蛋白质序列还可用其它特征提取方法提取的特征向量来表示,如氨基酸组成成分、加权自相关函数和矩描述子。将这些特征向量输入到多类支持向量机分类器,通过积规则系统融合这四类特征分类器的分类结果。与他人结果相比,在Jackkife交叉验证下和独立样本测试下,该方法获得了较高的预测精度,说明提出的整合蛋白质进化保守性和多特征分类器融合思想,对于蛋白质亚细胞定位预测是有效的,可与现有方法互补。
Information of the subcellular locations of proteins is important because it can provide useful insights about their functions, as well as how and in what kind of cellular environments they interact with each other and with other molecules. Knowledge of the localization of proteins within cellular compartments can help understand the intricate pathways that regulate biological processes at the cellular level. Facing the explosion of newly generated protein sequences in the post genomic era, developing an automated method for fast and reliably annotating their subcellular locations is becoming more and more important. Here, a novel approach was developed by incorporating protein evolutionary conservation information. Based on the concept of Chou's pseudo amino acid composition (PseAAC) and per residue conservation score calculated with an improved evolutionary conservation algorithm, each protein can be represented as a feature vector created with multi-scale energy (MSE). In addition, the protein can be represented as other feature vectors based on amino acid composition (AAC), weighted auto-correlation function and Moment descriptor methods. Then, the feature vectors of all protein sequences are further input into multi-class support vector machines to predict 12 kinds of subcellular locations. Finally, the results of four kinds of feature classifiers were fused through a product rule system. Compared with the results reported by the previous investigators, higher success rates were obtained in both jackknife cross-validation test and independent dataset test, suggesting that introducing protein evolutionary information and the concept of fusing multi-features classifiers are quite encouraging and promising, and may become a useful tool in complementing the existing methods.
出处
《生物物理学报》
CAS
CSCD
北大核心
2009年第2期125-132,共8页
Acta Biophysica Sinica
基金
supported by a grant from The Young College Teachers Projects in Henan Province (2007-335)
关键词
进化信息
多尺度能量
加权自相关函数
矩描述子
融合
亚细胞定位
Evolutionary information
Multi-scale energy
Weighted auto-correlation function
Moment descriptor
Fuse
Subcellular location