摘要
提出了一种基于序列和PPI特征的距离公式,可综合序列氨基酸组成和PPI对象、强弱等信息对两个蛋白质的相似性进行表征,并在此基础上提出了一种用于蛋白质亚细胞定位预测的K近邻算法。利用留一法对性能进行了评估,结果显示,在序列基础上加入PPI特征,可明显有助于亚细胞定位的预测;同时基于上述距离的K近邻算法也优于使用相同特征的SVM算法,表明该算法可以对蛋白质的亚细胞定位信息进行准确有效的预测。
Information of protein subcellular localization is indispensable to study protein function, as a protein can perform its function only after it is correctly transported to a specific subcellular compartment. Thus it is very important to provide accurate prediction of protein subcellular localization in biological studies. In contrast to sequence features (e.g. amino acids composition) that are widely used in subcellular localization prediction, features extracting protein-protein interaction (PPI) are largely ignored, although they reflect the co-localization information of different proteins. In this study, we propose a novel distance formula based on both protein sequence and PPI features, which precisely measures the similarity of proteins by incorporating protein information including amino acid composition, PPI and the corresponding interaction scores. Based on this distance formula, we further introduce a k-nearest neighbor (KNN) algorithm for predicting subcellular localization. The results of leave-one-out test on a benchmark dataset show that PPI features significantly improve the performance of protein subcellular localization. Meanwhile, this KNN algorithm also outperformes SVM algorithm adopting the same features, suggesting the efficiency of the proposed algorithm for predicting protein subcellular localization.
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2015年第3期467-470,共4页
Journal of University of Electronic Science and Technology of China
基金
国家自然科学基金(61101061
31100955)
中央高校基本科研业务费专项资金(WK2100230011)
高等学校博士学科点专项科研基金(20113402120028)
关键词
生物信息学
K近邻算法
蛋白质相互作用
亚细胞定位
bioinformatics
K-nearest neighbor algorithm
protein-protein interaction
subcellular localization