摘要
从蛋白质序列出发,我们首先将每条蛋白质序列分成等长的15段得到离散增量值、低频功率谱密度值、N端和C端的打分值和模体频数构成的组合向量表示蛋白质序列信息,用随机森林算法,对氧化还原酶、转移酶、水解酶、裂解酶、异构酶和连接酶中包含的亚类分别进行分类预测。总精度依次为90.86%、95.24%、96.42%、98.60%、97.53%和98.03%。除转移酶和水解酶略低于前人,其余好于前人的预测结果。
Based on protein sequence, by selecting increment of diversity value, .low - frequency of power spectral density, matrix scoring function values and motif frequency as characteristic parameters to describe the sequence information, Random Forest algorithm for predicting enzyme subclass is proposed. The overall success rate are 90.86%, 95.24%, 96.42%, 98.60%, 97.53% and 98.03%. Furthermore, in the same way, using the previous database to predict is better than the previous forecast results.
出处
《阴山学刊(自然科学版)》
2014年第2期22-25,共4页
Yinshan Academic Journal(Natural Science Edition)
关键词
模体
矩阵打分值
离散增量
随机森林
酶的亚类
Motif
matrix scoring function value
Increment of diversity value
Random Forest algorithm
en- zyme subclasses
prediction