摘要
【目的】对蛋白质所属的亚细胞区间进行预测,为进一步研究蛋白质的生物学功能提供基础。【方法】以蛋白质序列的氨基酸组成、二肽、伪氨基酸组成作为序列特征,用BLAST比对改进K最近邻分类算法(K-nearest neighbor,KNN)实现蛋白序列所属亚细胞区间预测。【结果】在Jackknife检验下,数据集CH317三种特征的成功率分别为91.5%、91.5%和89.3%,数据集ZD98成功率分别为93.9%、92.9%和89.8%。【结论】BLAST比对改进KNN算法是预测蛋白质亚细胞区间的一种有效方法。
[Objective] A new subcellular location prediction algorithm is proposed that provides basis for further experimental study of protein biological function. [Methods] Nearest neighbor classification algorithm improved by Blast comparison is used to predict the protein subcellular locations by three sequence features including amino acid composition, two peptides and pseudo amino acid composition of protein sequence. [Results] Through Jackknife test, on data set CH317 the success rates of 3 characteristics were 91.5%, 91.5% and 89.3%, on data set ZD98 success rates were 93.9%, 92.9% and 89.8%. [Conclusion] K-Nearest Neighbor algorithm improved by Blast comparison is an effective method for predicting subcellular locations of proteins.
出处
《微生物学通报》
CAS
CSCD
北大核心
2016年第10期2298-2305,共8页
Microbiology China
基金
中央高校基本科研业务费专项资金项目(No.KYZ201668)
江苏省自然科学基金项目(No.BK2012363
BK20140002)
江苏省博士后科研项目(No.1302038B)~~
关键词
亚细胞区间
KNN
BLAST
蛋白序列特征
Subcellular locations
K-Nearest Neighbor
Blast
Protein sequence characteristics