变构是调节蛋白质功能的重要机制,对许多生物过程至关重要。变构调节剂比正构剂具有更高的特异性和更低的毒副作用,这使得变构药物设计比正构药物设计有更多的优势。变构位点的发现是变构药物设计的前提,目前实验上获得的变构位点多是...变构是调节蛋白质功能的重要机制,对许多生物过程至关重要。变构调节剂比正构剂具有更高的特异性和更低的毒副作用,这使得变构药物设计比正构药物设计有更多的优势。变构位点的发现是变构药物设计的前提,目前实验上获得的变构位点多是偶然所得,因此亟待发展有效的理论方法来预测蛋白质变构位点。本工作提出了一种集成的机器学习方法AllosEC用于预测蛋白质变构口袋,该方法除了考虑口袋的理化性质外,还加入了口袋的二级结构信息、深度指数(DPX)和突出指数(CX)特征。另外,为了克服正负样本极度不平衡的问题,本工作使用欠采样方法来平衡训练数据集。在独立测试集上,AllosEC在多个评价指标上优于现有的其他方法,SEN、SPE、PRE和MCC分别为0.708、0.915、0.405和0.486。这样,本工作提供了性能良好的蛋白质变构位点预测方法AllosEC。Allostery is an important mechanism for regulating protein functions, which is essential for many biological processes. Compared with orthosteric regulators, allosteric regulators have higher specificity and lower toxicities, which makes allosteric drug design have more advantages than orthosteric drug design. The discovery of allosteric sites is a prerequisite for allosteric drug design. Currently, experimentally obtained allosteric sites are mostly obtained by chance, and therefore there is an urgent need to develop effective theoretical methods to predict protein allosteric sites. Here, we present an ensemble machine learning method AllosEC for protein allosteric pocket prediction, where besides the pockets’ physicochemical properties, their secondary structure information, depth indexes (DPXes) and protrusion indexes (CXes) are considered. In order to overcome the problem of extreme imbalance between positive and negative samples, this work uses an under sampling method to balance the training dataset. AllosEC outperforms other existing methods in multiple evaluation metrics on the independent test set, with SEN, SPE, PRE and MCC of 0.708, 0.915, 0.405 and 0.486, respectively. Thus, this work provides a good method AllosEC for protein allosteric site prediction.展开更多
文摘变构是调节蛋白质功能的重要机制,对许多生物过程至关重要。变构调节剂比正构剂具有更高的特异性和更低的毒副作用,这使得变构药物设计比正构药物设计有更多的优势。变构位点的发现是变构药物设计的前提,目前实验上获得的变构位点多是偶然所得,因此亟待发展有效的理论方法来预测蛋白质变构位点。本工作提出了一种集成的机器学习方法AllosEC用于预测蛋白质变构口袋,该方法除了考虑口袋的理化性质外,还加入了口袋的二级结构信息、深度指数(DPX)和突出指数(CX)特征。另外,为了克服正负样本极度不平衡的问题,本工作使用欠采样方法来平衡训练数据集。在独立测试集上,AllosEC在多个评价指标上优于现有的其他方法,SEN、SPE、PRE和MCC分别为0.708、0.915、0.405和0.486。这样,本工作提供了性能良好的蛋白质变构位点预测方法AllosEC。Allostery is an important mechanism for regulating protein functions, which is essential for many biological processes. Compared with orthosteric regulators, allosteric regulators have higher specificity and lower toxicities, which makes allosteric drug design have more advantages than orthosteric drug design. The discovery of allosteric sites is a prerequisite for allosteric drug design. Currently, experimentally obtained allosteric sites are mostly obtained by chance, and therefore there is an urgent need to develop effective theoretical methods to predict protein allosteric sites. Here, we present an ensemble machine learning method AllosEC for protein allosteric pocket prediction, where besides the pockets’ physicochemical properties, their secondary structure information, depth indexes (DPXes) and protrusion indexes (CXes) are considered. In order to overcome the problem of extreme imbalance between positive and negative samples, this work uses an under sampling method to balance the training dataset. AllosEC outperforms other existing methods in multiple evaluation metrics on the independent test set, with SEN, SPE, PRE and MCC of 0.708, 0.915, 0.405 and 0.486, respectively. Thus, this work provides a good method AllosEC for protein allosteric site prediction.