摘要
乳腺癌是世界上常见且致死率高的癌症。本文在充分考虑化合物分子描述符之间的非线性关系的同时,建立了随机森林模型,对化合物的生物活性进行定量预测。为寻找最优分子描述符的取值,在轮盘赌策略的基础上采用遗传算法,对ADMET性质进行分类预测,通过预测结果提升拮抗剂生物活性的预测效率。研究结果表明:所建立的随机森林模型预测精度高,模型参考价值得到有效提升;通过多次迭代遗传算法,能够准确找到因变量的最优值,为抗乳腺癌药物的研究提供数据支撑和理论参考。
Breast cancer is one of the most common and deadly cancers in the world.In this paper,a stochastic forest model is established to quantitatively predict the biological activity of compounds while considering the nonlinear relationship among molecular descriptors.In order to find the optimal value of molecular descriptors,the genetic algorithm was used to classify and predict the properties of ADMET based on the roulette strategy,which provides a prediction service for optimizing the biological activity of antagonists.The study result shows that:the established random forest model with appropriate has high prediction accuracy,and the reference value of the model is effectively improved;the optimal value of the dependent variable is found accurately through several iterations of the genetic algorithm,which provides theoretical reference and data support for the research of anti-breast cancer drugs.
作者
任静莹
马成满
毕四旭
邵喜高
REN Jingying;MA Chengman;BI Sixu;SHAO Xigao(School of Mathematics and Statistics Science,Ludong University,Yantai 264039,China)
出处
《鲁东大学学报(自然科学版)》
2023年第2期159-164,共6页
Journal of Ludong University:Natural Science Edition
基金
山东省社会科学规划研究项目(20CSDJ10)。
关键词
随机森林
遗传算法
乳腺癌药物
生物活性
random forest
genetic algorithm
breast cancer
biological activity