随机森林算法在β-发夹模体预测中的应用

Application of Random Forests Algorithm in β-hairpins Motif Prediction

下载PDF

导出

摘要基于对β-发夹模体预测的探索,文章尝试使用新的预测方法,即随机森林算法,以离散增量、矩阵打分值和预测的二级结构信息为特征参数,对Arch DB40数据库中loop长为2-8个氨基酸残基的β-发夹模体进行预测,将数据集平均分成5份,其中1份做训练集、4份做检验集,独立检验的预测精度为79.4%,相关系数为0.48。此外,对Arch DB40数据库中的β-发夹模体进行预测,在特征参数和检验方法相同的情况下,随机森林算法的预测效果要好于支持向量机(SVM)。 Based on the exploration of recognizing β-hairpins motif,we present a novel method,random forests algorithm is proposed in this paper. By using the increment of diversity,the position weight matrix score and the predicted secondary structure as a characteristic parameter. The prediction was based on the β-hairpin motifs in Arch DB40 dataset. The motifs with the loop length of 2 to 8 are extracted as research object. the dataset was divided into five sets in this paper,one was used as training set and the others were used as testing set. The overall accuracy of prediction and Matthew＇s correlation coefficient are 79. 4% and 0. 48 in the independent testing. In addition,to predict the β-hairpin motifs in Arch DB40 dataset,under the condition of the same characteristic parameters and testing methods,the prediction effect of random forest algorithm is better than the support vector machine（ SVM）.

作者贾少春

机构地区忻州师范学院

出处《忻州师范学院学报》 2015年第5期6-9,28,共5页 Journal of Xinzhou Teachers University

关键词随机森林算法离散增量矩阵打分函数 Β-发夹模体 random forests algorithm increment of diversity scoring matrix β-hairpins motif

分类号 O24 [理学—计算数学]

引文网络
相关文献

参考文献28

1Kuhn M, Meiler J,Baker D. Strand - Loop - Strand Motifs:Prediction of Hairpins and Diverging Turns in Proteins [ J ]. PRO- TEINS : Structure, Function, Bioinformatics, 2004,54 (2) : 282 - 288.
2Wintjens R T, Rooman M J, Wodak S J. Automatic classification and analysis of alpha alpha - Turn Motifs in Proteins [ J ]. Jour- nal of Molecular Biology, 1996,255 ( 1 ) :235 - 253.
3Jones DT. Protein secondary structure prediction based on position- specific scoring matrices[ J ]. J. Mol. Biol, 1999,292 (2) : 195 - 202.
4Cruz X, Hutchinson E G, Shepherd A. Toward predicting protein topology : An approach to identifying β hairpins [ J ]. Proceed- ings of the National Academy Sciences of the USA ,2002,99 (17) :11157 -11162.
5Kumar M, Bhasin M, Natt N K, etc. BhairPred : prediction ofβ - hairpins in a protein from multiple alignment information using ANN and SVM techniques [ J ]. Nucleic Acids Research ( Web - server - Issue), 2005 ( 33 ) : 154 - 159.
6Hu XZ, Li QZ. Prediction of the β -hairpins in Proteins Using Support Vector Machine [ J ]. Protein J,2008,27 (2) :115 - 122.
7Hu XZ, Li QZ ,Wang CL. Recognition of β-hairpin motifs in proteins by using the composite vector[ J]. Amino. Acids ,2010, 38(3) :915 -921.
8Oliva A,Bates P A,Querol E,et al. An Automated Classification of the Structure of Protein Loops[ J]. J. Mol. Biol, 1997,266 (4) :814 -830.
9Espadaler J, Fuentes N F, Hermoso A, et al. ArchDB :automated protein loop elassification as a tool for structural genomics [ J ]. Nucleie. Acids. Research ( Database Issue), 2004 (32) : 185 - 188.
10Panek J, Eidhammer I, Aasland R. A new method for identification of protein (sub) families in a set of proteins based on hy- dropathy distribution in proteins [ J ]. PROTEINS: Strueture, Funetion, Bioinformatics ,2005,58 (4) :923 - 934.

二级参考文献123

1高苏娟,胡秀珍.蛋白质中strand-loop-strand模体的分类[J].内蒙古工业大学学报（自然科学版）,2009,28(1):24-30. 被引量：1
2王云飞,庞勇,舒清态.基于随机森林算法的橡胶林地上生物量遥感反演研究--以景洪市为例[J].西南林业大学学报（自然科学）,2013,33(6):38-45. 被引量：22
3李凤敏,李前忠.蛋白质亚细胞定位的识别[J].生物物理学报,2004,20(4):297-306. 被引量：11
4杨科利,李前忠,林昊.预测酵母(Yeast)基因转录因子结合位点[J].内蒙古大学学报（自然科学版）,2006,37(5):524-530. 被引量：16
5胡秀珍,李前忠.用离散量的方法识别蛋白质的超二级结构[J].生物物理学报,2006,22(6):424-428. 被引量：16
6Stormo GD.DNA binding sites:representation anddiscovery[J].Bioinformatics,2000,20(1):16～23.
7van Helden J,Andre B,Collado-Vides J.Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies[J].J Mol Biol,1998,281(5):827～842.
8Hertz G,Atormo G.Identifying DNA and protein patterns with statistically significant alignments of multiple sequences[J].Bioinformatics,1999,15(7～8):563～577.
9Schones D E,Sumazin P,Zhang MQ.Similarity of position frequency matrices for transcription factor binding sites[J].Bioinformatics,2005,21(3):307～303.
10Sandelin A,Wasserman W W.Constrained binding site diversity within families of transcription factors enhances pattern discovery.bioinformatics[J].J Mol Biol,2004,338(2):207～215.

共引文献134

1苑迎春,周毅,宋宇斐,徐铮,王克俭.基于信息熵特征选择的小麦冠层叶绿素含量估测方法[J].农业机械学报,2022,53(8):186-195. 被引量：5
2王伟东,常庆瑞,王玉娜.基于UHD185成像光谱仪的冬小麦叶绿素监测[J].干旱区研究,2020(5):1362-1369. 被引量：7
3杨乌日吐,李前忠,刘利,樊国梁.用支持向量机预测人类基因5′/3′选择性剪切位点[J].现代生物医学进展,2007,7(5):790-792. 被引量：2
4杨乌日吐,李前忠,杨科利,林昊.基于序列信息理论预测线虫基因选择性剪切位点[J].内蒙古大学学报（自然科学版）,2008,39(1):45-49. 被引量：1
5姜雪,胡秀珍.蛋白质β-发夹模体片断的识别[J].内蒙古工业大学学报（自然科学版）,2008,27(2):88-94. 被引量：3
6姜雪,胡秀珍.打分矩阵方法在β-发夹模体识别中的应用[J].生物信息学,2008,6(4):156-158. 被引量：6
7马志强,魏雅卓,崔颖,马雅楠,孙平平,陆林英.遗传算法在转录因子结合位点识别中的应用[J].生物信息学,2009,7(1):72-74.
8姜雪.蛋白质βαβ模体序列的统计分析及其识别[J].昆明理工大学学报（理工版）,2010,35(5):83-88.
9王春连,胡秀珍.27类蛋白质折叠子的识别及其位点的统计分析[J].内蒙古工业大学学报（自然科学版）,2010,29(1):18-24.
10王婷,胡秀珍.使用支持向量机的方法预测膜蛋白的类型[J].内蒙古工业大学学报（自然科学版）,2010,29(4):241-246. 被引量：2

1贾少春,胡秀珍.基于添加功能位点信息的组合向量预测β-发夹模体[J].内蒙古工业大学学报（自然科学版）,2012,31(3):1-9. 被引量：2
2罗超.面向高维数据的随机森林算法优化探讨[J].商,2016,0(4):207-207. 被引量：1
3曹正凤,谢邦昌,纪宏.一种随机森林的混合算法[J].统计与决策,2014,30(4):7-9. 被引量：12
4马景义,谢邦昌.用于分类的随机森林和Bagging分类树比较[J].统计与信息论坛,2010,25(10):18-22. 被引量：17
5马景义,吴喜之,谢邦昌.拟自适应分类随机森林算法[J].数理统计与管理,2010,29(5):805-811. 被引量：16
6江凡,丁玮.Improvement of a new rotation function for molecular replacement by designing new scoring functions and dynamic correlation coefficient[J].Chinese Physics B,2010,19(10):354-359.
7贾怀乐,曹晓勇,田耀宇,胡秀珍.基于组合的矩阵打分算法识别Na^+和K^+配体结合残基[J].内蒙古工业大学学报（自然科学版）,2016,35(4):271-276.
8陈超英.一种基于信息理论的距离系数[J].生物数学学报,2007,22(4):725-730.
9李毓,张春霞.基于out-of-bag样本的随机森林算法的超参数估计[J].系统工程学报,2011,26(4):566-572. 被引量：14
10刘亚东,崔日鲜.基于可见光光谱和随机森林算法的冬小麦冠层图像分割[J].光谱学与光谱分析,2015,35(12):3480-3484. 被引量：10

忻州师范学院学报

2015年第5期

浏览历史

内容加载中请稍等...

随机森林算法在β-发夹模体预测中的应用

参考文献28

二级参考文献123

共引文献134

相关作者

相关机构

相关主题

浏览历史