期刊文献+

用随机森林算法预测六类酶的亚类

Predicting Enzyme Subclasses by Using Random Forest
原文传递
导出
摘要 从蛋白质序列出发,我们首先将每条蛋白质序列分成等长的15段得到离散增量值、低频功率谱密度值、N端和C端的打分值和模体频数构成的组合向量表示蛋白质序列信息,用随机森林算法,对氧化还原酶、转移酶、水解酶、裂解酶、异构酶和连接酶中包含的亚类分别进行分类预测。总精度依次为90.86%、95.24%、96.42%、98.60%、97.53%和98.03%。除转移酶和水解酶略低于前人,其余好于前人的预测结果。 Based on protein sequence, by selecting increment of diversity value, .low - frequency of power spectral density, matrix scoring function values and motif frequency as characteristic parameters to describe the sequence information, Random Forest algorithm for predicting enzyme subclass is proposed. The overall success rate are 90.86%, 95.24%, 96.42%, 98.60%, 97.53% and 98.03%. Furthermore, in the same way, using the previous database to predict is better than the previous forecast results.
作者 王莹
出处 《阴山学刊(自然科学版)》 2014年第2期22-25,共4页 Yinshan Academic Journal(Natural Science Edition)
关键词 模体 矩阵打分值 离散增量 随机森林 酶的亚类 Motif matrix scoring function value Increment of diversity value Random Forest algorithm en- zyme subclasses prediction
  • 相关文献

参考文献13

  • 1L.F.Yan,Z.R.Sun.Protein molecular structures[M].Beijing:Tsinghua University,1999:65 74.
  • 2L.F.Yah.The structure and the function of protein[M].Changsha:Hunan science and technology publishing house,1988.
  • 3SHEN H.B.,CHOU K.C.EzyPred:A top-down approach for predicting enzyme functional classes and subclasses[J].Biochemical and Biophysical Research Communication,2007,364:53-59.
  • 4Bailey TL,Williams N,Misleh C,Li WW.MEME:discovering and analyzing DNA and protein sequence motifs[J].Nucl Acids Res,2006,34:369-373.
  • 5Castro,D.E,Sigrist,C.J.,Gattiker,A.,Bulliard,V.,etc.ScanProsite:detection of PROSITE signature matches and ProRule-associated functional and structural residues in protein[J].Nucleic Acids Research,2009,37:202-208.
  • 6Breiman L.Random forests[J].Machine Learning,2001,45:5-32.
  • 7袁敏,胡秀珍.随机森林方法预测膜蛋白类型[J].生物物理学.2009,5:349-355.
  • 8CHOU K.C.The biological functions of low-frequency phonons:3.Helical structures and microenvironment[J].Biophysical journal,1984,45:881-890.
  • 9Lei Liu,Xiuzhen Hu.Bases on Improved Parameters Predicting Protein Fold.2010 sixth international conference on Natural computation YanTai,shandon.
  • 10李凤敏,李前忠.蛋白质亚细胞定位的识别[J].生物物理学报,2004,20(4):297-306. 被引量:11

二级参考文献21

  • 1李凤敏,李前忠.蛋白质亚细胞定位的识别[J].生物物理学报,2004,20(4):297-306. 被引量:11
  • 2Stormo GD.DNA binding sites:representation anddiscovery[J].Bioinformatics,2000,20(1):16~23.
  • 3van Helden J,Andre B,Collado-Vides J.Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies[J].J Mol Biol,1998,281(5):827~842.
  • 4Hertz G,Atormo G.Identifying DNA and protein patterns with statistically significant alignments of multiple sequences[J].Bioinformatics,1999,15(7~8):563~577.
  • 5Schones D E,Sumazin P,Zhang MQ.Similarity of position frequency matrices for transcription factor binding sites[J].Bioinformatics,2005,21(3):307~303.
  • 6Sandelin A,Wasserman W W.Constrained binding site diversity within families of transcription factors enhances pattern discovery.bioinformatics[J].J Mol Biol,2004,338(2):207~215.
  • 7Wasserman W W,Sandelin A.Applied bioinformatics for the identification of regulatory elements[J].at Rev Genet,2004,5(4):276~287.
  • 8Pritsker M,Liu Y C,Beer M A,et al.Whole-Genome Discovery of Transcription FactorBinding Sites by Network-Level Conservation[J].Genome Research,2004,14(1):99~108.
  • 9Chekmenev D S,Haid C,Kel A E.P-Match:transcription factor binding site search by combining patterns and weight matrices[J].Nucleic Acids Research,2005,33:W432~437.
  • 10Kel A E,GoBling E,Reuter I,et al.MATCHTM:a tool for searching transcription factor binding sites in DNA sequences[J].Nucleic Acids Research,2003,31(13):3576~3579.

共引文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部