摘要
基于对β-发夹模体预测的探索,文章尝试使用新的预测方法,即随机森林算法,以离散增量、矩阵打分值和预测的二级结构信息为特征参数,对Arch DB40数据库中loop长为2-8个氨基酸残基的β-发夹模体进行预测,将数据集平均分成5份,其中1份做训练集、4份做检验集,独立检验的预测精度为79.4%,相关系数为0.48。此外,对Arch DB40数据库中的β-发夹模体进行预测,在特征参数和检验方法相同的情况下,随机森林算法的预测效果要好于支持向量机(SVM)。
Based on the exploration of recognizing β-hairpins motif,we present a novel method,random forests algorithm is proposed in this paper. By using the increment of diversity,the position weight matrix score and the predicted secondary structure as a characteristic parameter. The prediction was based on the β-hairpin motifs in Arch DB40 dataset. The motifs with the loop length of 2 to 8 are extracted as research object. the dataset was divided into five sets in this paper,one was used as training set and the others were used as testing set. The overall accuracy of prediction and Matthew's correlation coefficient are 79. 4% and 0. 48 in the independent testing. In addition,to predict the β-hairpin motifs in Arch DB40 dataset,under the condition of the same characteristic parameters and testing methods,the prediction effect of random forest algorithm is better than the support vector machine( SVM).
出处
《忻州师范学院学报》
2015年第5期6-9,28,共5页
Journal of Xinzhou Teachers University