摘要
由于蛋白质亚细胞位置与其一级序列存在很强的相关性,利用多样性增量来描述蛋白质之间氨基酸组分和二肽组分的相似程度,采用修正的马氏判别式(这里称为IDQD方法)对分枝杆菌蛋白质的亚细胞位置进行了预测。利用Jackknife检验对不同序列相似度下的蛋白质数据集进行了预测研究,结果显示,当数据集的序列相似度小于等于70%时,算法的预测精度稳定在75%左右。在对整体852条蛋白质的预测成功率达到87.7%,这一结果优于已有算法的预测精度,说明IDQD是一种有效的分枝杆菌蛋白质亚细胞预测方法。
The protein subcellular location correlates with protein primary sequence. By selecting amino acid composition and ngap dipeptide as parameters,a model combined increment of diversity with modified Mahalanobis Discriminant, called IDQD model, is used to predict four subcellular locations of mycobacterial proteins. The results of jackknife cressvalidation for datasets with sequence identity lower 70% show that overall predicted successful rates are approximately 75 %. The overall accuracy for 852 proteins is 87.7 % which is higher than other methods. The results indicate that the IDQD model can effectively predict the subcellular location of mvcobacterial protein.
出处
《生物信息学》
2009年第4期252-254,共3页
Chinese Journal of Bioinformatics
基金
电子科技大学优秀毕业生科研启动费
关键词
分枝杆菌
氨基酸
二肽
多样性增量
马氏判别函数
Mycobacterium
Amino acid
Dipeptide
Increment of diversity
Mahalanobis Discriminant