摘要
目的:基于支持向量机建立一个自动化识别新肽链四级结构的方法,提高现有方法的识别精度。方法:改进4种已有的蛋白质一级序列特征值提取方法,采用线性和非线性组合预测方法建立一个有效的组合预测模型。结果:以同源二聚体及非同源二聚体为例,对4种特征值提取方法进行改进后其分类精度均提升了2~3%;进一步实施线性与非线性组合预测后,其分类精度再次提高了2~3%,使独立测试集的分类精度达到了90%以上。结论:4种特征值提取方法均较好地反应出蛋白质一级序列包含四级结构信息,组合预测方法能有效地集多种特征值提取方法优势于一体。
Objective: To establish a method of automatically identifying protein structures based on support vector machine for improving the present classification accuracies. Methods: The former four methods of feature extraction from the amino acid sequences were improved, and then an effective combinatorial forecast model was established based on linear and non-linear method. Results: The classification precision of the four improved models has increased by 2-3 % over before.Then,combinatorial forecast was further introduced, and the classification precision has increased by 2-3 % again.Finally,the precision of independent testing set exceeded 90 %. Conclusion: The results indicate that protein primary sequence contains quaternary structure information. And the combinatorial forecast method can effectively integrate with several kinds of methods of feature value extraction in the primary sequences.
出处
《现代生物医学进展》
CAS
2008年第4期646-648,637,共4页
Progress in Modern Biomedicine
基金
国家自然科学基金(No.30570351)
教育部新世纪优秀人才支持计划(NCET-06-0710)
关键词
蛋白质四级结构
分类
支持向量机
组合预测
Protein quaternary structure
Classification
Support vector machines
Combinatorial forecast