摘要
目的:从大肠埃希氏杆菌UTI89基因组中筛选出全部潜在的分泌蛋白并进行初步研究。方法:使用SignalP3.0、TatP1.0、SecretomeP2.0等蛋白分析软件对5211个ORF进行预测;对筛选出的信号肽及分泌蛋白的基本特征进行统计学分析;使用Blast2 Sequences进行同源性分析。结果:共筛选出432个sec途径分泌蛋白,19个Tat途径分泌蛋白,386个非经典分泌蛋白;信号肽、分泌蛋白平均长度分别为25.5aa、282.8aa;信号肽中出现频率最高的3种氨基酸依次为L、A、S;仅有两个信号肽的氨基酸序列完全相同,相应的分泌蛋白高度同源。结论:大肠埃希氏杆菌UTI89基因组中有837个ORF可能编码分泌蛋白;分泌蛋白集中在500aa以下;组成信号肽的氨基酸相对保守,多数为疏水氨基酸;信号肽变异性较大,含相同信号肽的蛋白可能由同源基因编码。
Objective: To identify all the potential ORFs encoding secreted proteins in Escherichia coli UTI89 genome, and to preliminarily eharaeterise the secreted proteins and the signal peptides. Methods: Entire 5211 ORFs were predicted by network softwares including SignalP3.0, TatP1.0, SecretomeP2.0, etc. The basic features of signal peptides and secreted proteins in prediction results were statistically analysed. The secreted proteins which have the same signal sequence were aligned by the programe Blast 2 Sequences. Results: 432 ORFs encoding Sec pathway secreted proteins, 19 ORFs encoding Tat pathway secreted proteins and 386 ORFs encoding non-classically secreted proteins exist in E. coli UTI89 genome. The mean length of signal peptides is 25.5 aa, and the mean length of secreted proteins is 282.8 aa. The top three frequent amino acids in signal peptides are L, A, S. Only two signal peptides have the same sequence, and the corresponding secreted proteins are highly homologous. Conclusions: 837 ORFs in E. coli UTI89 genome may encode secreted proteins. Lengths of most secreted proteins are less than 500 aa. Amino acids in signal peptides are relatively conservative, most of which are hydrophobic. Signal peptides diversify widely but the secreted proteins containing the same signal sequence are likely to be encoded by homologous genes.
出处
《中国生物工程杂志》
CAS
CSCD
北大核心
2007年第7期100-105,共6页
China Biotechnology