摘要
目的:通过支持向量机算法预测蛋白质序列中的磷酸化修饰位点信息。方法:从最新版Swiss-Prot数据库中提取人类磷酸化数据并构建样本集,提取筛选四类序列特征并利用支持向量机算法进行预测。结果:十倍交叉验证结果表明,丝氨酸、苏氨酸、酪氨酸上的磷酸化修饰位点识别准确率分别达93.00%、82.25%、81.50%,马太相关系数(MCC)达0.86、0.65、0.63。独立测试集验证结果中识别准确率整体优于同类预测工具RF-Phos,苏氨酸样本集上性能表现最优,准确率达86.59%,MCC值达0.63。结论:通过支持向量机整合序列特征对磷酸化修饰位点定位,为进一步的生物学实验提供较为准确的信息。
Objective:In this paper,Support vector machine(SVM)is used to predict the information of phosphorylation sites in protein sequence.Method:Human phosphorylation data is extracted from the latest version of Swiss-prot database and a sample set is constructed.Four kinds of sequence features are extracted and screened.Then Support vector machine algorithm is used to predict.Result:10-fold cross validation result showed that the recognition accuracy of phosphorylation site on serine,threonine and tyrosine was 93.00%,82.25%and 81.50%,respectively.Matthew correlation coefficient was 0.86,0.65,0.63.In the independent test set,the accuracy of our model is better than RF-Phos.Our model offered the best performance on the threonine set,the accuracy reached 86.59%,and the MCC value was 0.63.Conclusion:The prediction of phosphorylation sites based on SVM provides more accurate information for further biological experiments.
作者
邓晓政
徐瑞杰
陈宇
Deng Xiao-zheng;Xu Rui-jie;Chen Yu(School of Data Science and Software Engineering,Qingdao University,Shandong Qingdao 266071)
出处
《生物化工》
2020年第3期45-49,共5页
Biological Chemical Engineering
基金
山东省重点研发计划重大科技创新工程(2019JZZY020101)。
关键词
磷酸化
支持向量机
物理化学属性
F值检验
Phosphorylation
Support vector machine
Physicochemical property
F-value test