期刊文献+

面向蛋白质功能位点识别的机器学习平台构建 被引量:3

Machine learning platform for protein function sites prediction
下载PDF
导出
摘要 有关蛋白质功能的研究是解析生命奥秘的基础,机器学习技术在该领域已有广泛应用。利用支持向量机(support vectormachine,SVM)方法,构建一个预测蛋白质功能位点的通用平台。该平台先提取非同源蛋白质序列,再对这些序列进行特征编码(包括序列的基本信息、物化特征、结构信息及序列保守性特征等),以编码好的样本作为训练数据,利用SVM进行训练,得到敏感性、特异性、Matthew相关系数、准确率及ROC曲线等评价指标,反复测试,得到评价指标最优的SVM模型后,便可以用来预测蛋白质序列上的功能位点。该平台除了应用在预测蛋白质功能位点之外,还可以应用于疾病相关单核苷酸多态性(SNP)预测分析、预测蛋白质结构域分析、生物分子间的相互作用等。 Research of protein function is the base of life mystery,and machine learning technology is widely used in this field.This paper constructs a general platform using support vector machine(SVM) to predict protein function sites.Firstly,the platform extracts non-homologous protein sequences,and codes characteristics which include basic information,physical and chemical characteristics,structure information,sequence conservation characteristics.Then uses SVM to train the coded dataset,and get sensitivity,specificity,Matthew correlation coefficients,accuracy and ROC curve.Finally,get the best model and use it to predict the unknown protein function sites.Moreover the platform can be used to analyze disease and the related SNP,predict protein domain,biomolecular interaction and so on.
出处 《生物信息学》 2010年第1期12-15,共4页 Chinese Journal of Bioinformatics
基金 国家自然科学基金(60671018 60771024)
关键词 蛋白质功能位点预测 机器学习 支持向量机 protein function sites prediction machine learning support vector machine
  • 相关文献

参考文献12

  • 1Epstein C J,Goldberger R F, Anfinsen C B. The genetic control of tertiary protein structure : studies with model systems [ J ]. Cold Spring Harb Symp Quant Biol, 1963, 28:439 -449.
  • 2Li S J, Liu B S, Zeng R, et al. Predicting O-glycosylation sites in mammalian proteins by using SVMs [ J ]. Computational Biology and Chemistry, 2006, 30:203-208.
  • 3Shen J W, Zhang J, Luo X M, et al. Predicting protein-protein interactions based only on sequences information [ J ]. PNAS, 2007,104 : 4337 -4341.
  • 4Kim J H, Lee J Y, Bermseok O, et al. Prediction of phosphorylation sites using SVMs [J]. Bioinformatics, 2004, 20: 3179-3184.
  • 5Wang L, Brown S J. BindN : a web - based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences [ J ]. Nucleic Acids Res, 2006, 34:243 - 248.
  • 6Ahmad S, Gromiha M M, Akinori S. Analysis and prediction of DNA - binding proteins and their binding residues based on composition, sequence and structural information[J]. Bioinformatics, 2004,20:477 - 486.
  • 7Ahmad S, Sarai A. Moment-based prediction of DNA - binding proteins [J]. J Mol Biol, 2004, 341:65-71.
  • 8Keil M, Exner T E, Brickmann J. Pattern recognition strategies for molecular surfaces: Ⅲ. Binding site prediction with a neural network [J]. J Comput Chem, 2004, 25:779-789.
  • 9Frishman D, Argos P. Seventy-five percent accuracy in protein secondary structure prediction [J]. PROTEINS: Structure, Function, and Bioinformatics, 1997, 27:329-335.
  • 10Overington J P, Johnson M S, Sali A. Tertiary structural const raints on protein evolutionary diversity: templates, key residues and structure prediction [J]. Proc Biol Sci, 1990, 241:132-145.

同被引文献26

  • 1李苗苗,吴炳方,颜长珍,周为峰.密云水库上游植被覆盖度的遥感估算[J].资源科学,2004,26(4):153-159. 被引量:594
  • 2李伍举,吴加金.蛋白质功能位点预测[J].生物化学与生物物理进展,1993,20(1):60-62. 被引量:4
  • 3谭小丹,卢智勇,邓亲恺,姜勇.视紫质样GPCR中偶联氨基酸的变构通讯网络[J].第四军医大学学报,2005,26(15):1435-1438. 被引量:2
  • 4周鹏,周原,吴世容,李波,田菲菲,李志良.一种基于三维原子场相互作用矢量的新型氨基酸结构信息描述子[J].科学通报,2006,51(1):34-39. 被引量:5
  • 5施阳.Matlab语言精要及动态仿真工具Simulink[M].西安:西北工业大学出版社,1998.
  • 6赵英时.遥感应用分析原理与方法[M].北京:科学出版社,2002.299-301.
  • 7Kass I, Horovitz A. Mapping pathways of allosteric communication in GroEL by analysis of correlated utations [J]. Prot. Struct. Func. Gen. 2002,48(4) :611 -617.
  • 8Lockless, S.W. and Ranganathan, R. Evolutionarily conservedpathways of energetic connectivity in protein families [ J ]. Sci- ence, 1999,286 : 295 - 299.
  • 9John RS, Robert JS, Bradley AK, Clifford M, Joseph DH, Andy JJ, Christine L, Andrew A, Joseph JB, Ellen C, Jie T, Bi CS, Erik V, Robert W, Ellen ML, Douglas RD, Gyorgy S, Marc N, Mark WK, Ronald VS, Duncan EM, and Leslie WT. Structural Snapshots of Human HDAC8 ProvideInsights into the Class I His- tone Deacetylases[ J]. Structure, 2004,12:1325 - 1334.
  • 10Daniel PD, Stephanie LG, Samuel GG, Carol AF, and David WC - Structural Studies of Human Histone Deacetylase 8 and its Site - Specific Variants Complexed with Substrate and Inhibitors[ J]. Bio- chemistry,2008,47: 13554 - 13563.

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部