摘要
磷酸化是真核细胞蛋白质的一种重要的翻译后修饰作用。由于对蛋白质激酶底物的实验测定方法通常非常费时,而且会受多种实验条件的限制。因此通过机器学习的方法,利用蛋白质的一级序列信息对不同激酶家族作用的磷酸化位点进行有效的预测,不仅具有快速、自动等优点,还可以对相应的实验测定进行指导,具有重要的意义。本研究提出了一种基于Euclidean距离的k近邻算法,并使用了改进的判决函数,特征向量由基于BLOSUM62矩阵的平均分值构成。对多个磷酸激酶家族的测试结果显示,Sn和Sp的综合评价均高于目前常用的Scansite,KinasePhos和NetPhosK,同时该方法具有简单、高效、鲁棒性好等优点。
Phosphorylation is one of the most important post-translational modifications for eukaryotic proteins. Experimental identification of protein kinases' (PKs) substrates with their phosphorylation sites is time-consuming and often restricted by the availability of enzymatic reactions. Based on machine learning approaches, Phosphorylation sites prediction with their specific kinase from their primary sequences is favorably needed, for these methods can provide fast and automatic annotations, which can be used as guidelines for further experimental consideration. In this paper, we presented a modified k-Nearest Neighbor (k-NN) method measured by the Euclidean distance for phosphorylation site prediction. BLOSUM62-based similarity scores were adopted as the input vectors. Prediction results on several PK groups show that in general, it outperforms state of the art methods: Scansite, KinasePhos and NetPhosK, which suggests that this method is another competitive computational approach in this branch of bioinformatics. This method has the advantages of simpleness, efficiency and robustness.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2007年第3期404-408,共5页
Chinese Journal of Biomedical Engineering
基金
中国科学技术大学高水平大学建设重点项目