摘要
从已知蛋白质结构的氨基酸序列出发,利用DSSP和PROMTIF构建了蛋白质复杂超二级结构strand-loop-helix-loop-strand模体数据集。数据集含1458条蛋白质链,其中βαβ模体数为3632个,非βαβ模体数为3148个。将亲疏水组分、优化的位点氨基酸组分、预测的模体信息和二级结构信息共同作为序列特征输入支持向量机,5交叉检验的预测总精度和马氏相关系数达到了79.7%和0.59;独立检验的预测总精度和马氏相关系数达到了73.4%和0.47。
From the amino acid sequence of the known protein structure,we constructed complex secondary structureβαβmotifs datasets by using the DSSP and PROMTIF.Then the core structure,loop-helix-loop,of βαβ motif was analyzed,and the research object that loop-helix-loop length is from 10 to 26amino acids was selected.The dataset contained 1458 proteins,among 3632βαβmotifs and 3148non-βαβmotifs.Here we applied hydropathy,optimized amino acid composition of position,predicted function motif and predicted secondary structure information as combined sequence feature and input SVM algorithm.The overall accuracy and Matthew's correlation coefficient of 5-fold cross-validation achieved 79.7% and 0.59.The overall accuracy and Matthew's correlation coefficient of independent test achieved 73.4% and 0.47.
出处
《内蒙古工业大学学报(自然科学版)》
2015年第3期177-183,共7页
Journal of Inner Mongolia University of Technology:Natural Science Edition
基金
国家自然科学基金(30960090
31260203)
关键词
βαβ模体
SVM算法
位点氨基酸
亲疏水组分
超二级结构
βαβ motif
SVM algorithm
Position of amino acids
Hydropathy composition
Super secondary structure