摘要
拉氏图是一种经典的蛋白质结构验证工具,在蛋白质结构研究领域有广泛应用.然而,传统拉氏图定义的合理区域范围广,容错率高,且包含了一些不准确的结构.针对这一问题,提出了一种基于支持向量机(support vector machine,SVM)和贝叶斯优化的方法SVM-Rama,对传统拉氏图的合理区域定义进行优化和细分,使细分后的合理区域的范围精确到具体的二级结构种类,SVM-Rama法可以提高蛋白质结构验证准确率,且能简便精确地标注二级结构.研究结果表明,该方法在二级结构标记中的准确率接近传统方法取得的最好结果,但训练和计算成本远小于传统方法.
The Ramachandran plot is among the most central concepts for validating the conformation of protein structures,and accordingly plays an important role in structural biology.However,the favored regions defined when using the traditional Ramachandran plot are too wide and contain inaccurate structures.To address these deficiencies,a method based on support vector machine(SVM)and Bayesian optimization(SVM-Rama)for optimization and subdivision of the definition of favored regions for the Ramachandran plot is proposed.Aims in this study are to enhance the accuracy of the favored regions for the specific secondary structure species of proteins,and subsequently to validate and annotate protein secondary structures simply and accurately.The results reveal that the optimized plot has a high accuracy comparable to the best performance of traditional methods in secondary structure annotation,while facilitating analysis at lower training and computational costs than these traditional methods.
作者
王博
苏天昊
徐妍婷
高恒
郭聪
李永乐
吴伟
WANG Bo;SU Tianhao;XU Yanting;GAO Heng;GUO Cong;LI Yongle;WU Wei(International Centre for Quantum and Molecular Structures,College of Sciences,Shanghai University,Shanghai 200444,China;Materials Genome Institute,Shanghai University,Shanghai 200444,China)
出处
《上海大学学报(自然科学版)》
CAS
CSCD
北大核心
2024年第3期545-558,共14页
Journal of Shanghai University:Natural Science Edition
基金
上海市科学技术委员会创新项目(21JC1402700)
上海市“科技创新行动计划”启明星项目扬帆专项(22YF1413300)。
关键词
拉氏图
支持向量机
蛋白质结构标记
Ramachandran plot
Support vector machine
structure annotation of proteins