摘要
寡聚蛋白质相对于单体蛋白质具有许多优势,广泛地参与多种生命活动。本文提出次生特征提取方法,使用支持向量机作为分类器,采用"一对一"的多类分类策略,基于蛋白质一级序列提取特征方法,对四类同源寡聚体进行分类研究。结果表明,在Jackknife检验下,基于次生特征和氨基酸组成成分特征构成的特征集,加权情况下,其总分类精度最高达到了78.41%,比氨基酸组成成分特征提高13.09%,比参考文献最好特征集BG提高了6.86%,比最好原生特征集CM1提高了5.53%。此结果说明次生特征提取方法对于蛋白质同源寡聚体分类是一种非常有效的特征提取方法。
Protein homo-oligomers play an important role in various life processes. The secondary feature extraction method was proposed and used for predicting protein homo-oligomers. Processing primary features by statistical methods to increase the distance among primary features, secondary feature can be obtained. The support vector machine ( SVM ) was used as base classifier. The 78.41% total accuracy was arrived in jackknife test in the weighted factor conditions, which was 13.09% ,6.86% and 5.53% higher than those of conventional amino acid composition methods, that of the reference feature set BG and that of the best primary feature set CM1 in same condition respectively. The experimental results showed that the secondary feature extraction method is effective to increase the distance among primary features and improved the classification prediction performance.
出处
《北京生物医学工程》
2010年第1期16-22,共7页
Beijing Biomedical Engineering
基金
国家自然科学基金(60775012
60634030)
西北工业大学科技创新项目(KC02)资助
关键词
同源寡聚体
支持向量机
特征提取
原生特征
次生特征
homo-oligomers
support vector machines ( SVM )
feature extraction
primary feature
secondary feature