摘要
基于Adaboost算法对多个相似性比对K最近邻(K-nearest neighbor,KNN)分类器集成实现蛋白质的亚细胞定位预测。相似性比对KNN算法分别以氨基酸组成、二肽、伪氨基酸组成为蛋白序列特征,在KNN的决策阶段使用Blast比对决定蛋白质的亚细胞定位。在Jackknife检验下,Adaboost集成分类算法提取3种蛋白序列特征,3种特征在数据集CH317和Gram1253的最高预测成功率分别为92.4%和93.1%。结果表明Adaboost集成改进KNN分类预测方法是一种有效的蛋白质亚细胞定位预测方法。
Adaboost algorithm with improved K-nearest neighbor classifiers is proposed to predict protein subcellular locations. Improved K-nearest neighbor classifier uses three sequence feature vectors including amino acid composition, dipeptide and pseudo amino acid composition of protein sequence. K-nearest neighbor uses Blast in classification stage. The overall success rates by the jackknife test on two data sets of CH317 and Gram1253 are 92.4% and 93.1%. Adaboost algorithm with the novel K-nearest neighbor improved by Blast is an effective method for predicting subcellular locations of proteins.
作者
薛卫
王雄飞
赵南
杨荣丽
洪晓宇
Wei Xue Xiongfei Wang Nan Zhao Rongli Yang and Xiaoyu Hong(School of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China)
出处
《生物工程学报》
CAS
CSCD
北大核心
2017年第4期683-691,共9页
Chinese Journal of Biotechnology
基金
中央高校基本科研业务费专项资金(No.KYZ201668)
江苏省自然科学基金(No.BK2012363)
国家科技支撑计划(No.2015BAK36B05)资助~~