摘要
本文构建化合物对Erα生物活性的定量预测模型,结合集成学习方法与逻辑回归方法,使用自动的参数优化方法,使各算法达到最优泛化性能。首先使用随机森林算法,以信息理论为基础,将化合物的分子描述符对雌激素受体α亚型的活性影响进行特征重要性排序,得到可用于算法判断的20个高效变量;再根据这20个高效分子描述符,利用岭回归算法实现对ERα生物活性的定量预测。结果表明,该模型可以准确预测Erα生物活性,为科学选择抗乳腺癌药物提供了新思路。
In this paper, a quantitative prediction model for ERα biological activity of compounds was con-structed, combined with integrated learning method and logistic regression method, and automatic parameter optimization method was used to achieve the optimal generalization performance of each algorithm. First, based on information theory, random forest algorithm was used to rank the characteristic importance of the effects of molecular descriptors of compounds on the activity of es-trogen receptor α subtypes, and 20 efficient variables were obtained. Based on these 20 molecular descriptors, ridge regression algorithm was used to quantitatively predict the biological activity of ERα. The results show that this model can accurately predict the biological activity of Erα, which provides a new idea for scientific selection of anti-breast cancer drugs.
出处
《建模与仿真》
2023年第3期1820-1828,共9页
Modeling and Simulation