摘要
基于一系列作用于治疗乳腺癌重要靶标雌激素受体α亚型ERα的化合物,研究其分子结构和生物活性pIC_(50)之间的关系,构建抗乳腺癌候选药物定量构效关系模型.模型采用集成学习方法KNN-Bagging,通过组合多个KNN弱预测模型得到一个强集成器,实现对生物活性pIC_(50)的预测.基于集成学习的模型可决系数R^(2)达到0.9496,均方根误差MSE达到0.0016,与传统多元线性回归,多元非线性回归相比,有统计学意义上的显著提升。说明定量构效关系模型在化合物分子描述符具有多种数据类型情况下,采用KNN-Bagging集成学习方法对治疗乳腺癌的重要靶标ERα的生物活性具有较好的预测能力,可为筛选拮抗ERα活性的化合物提供理论上的指导。
Based on the data of a series of compounds that act on the important target of breast cancer treatment of estrogen receptor α subtype ERα,a quantitative structure-activity relationship model of anti-breast cancer drug candidates was constructed by studying the relationship between their molecular structure and biological activity pIC_(50).By combining multiple KNN weak prediction models,a KNN-Bagging strong integrator is obtained to predict the biological activity pIC_(50).Based on the integrated learning model,the coeficient of determination R^(2) reached 0.9496,and the root mean square error MSE reached 0.0016.Compared with traditional multiple linear regression and multiple nonlinear regression,this model has a statistically significant improvement.The results show that when the compound molecular descriptor has multiple data types,the KNN-Bagging integrated learning method has a better predictive ability on the biological activity of ERα.It can provide theoretical guidance for screening compounds that antagonize ERα activity.
作者
董金耐
谢卓冉
殷歌
王海文
杨淼
顾佳慧
DONG Jin-nai;XIE Zhuo-ran;YIN Ge;WANG Hai-wen;YANG Miao;GU Jia-hui(School of Electronic Engineering,Jiangsu Ocean University,Lianyungang 222000,China)
出处
《数学的实践与认识》
2023年第1期130-139,共10页
Mathematics in Practice and Theory
基金
国家自然科学基金(12171205)
江苏省基础研究计划(自然科学基金)(BK20191469)
江苏省自然资源发展专项资金(海洋科技创新)项目(JSZRHYKJ202116)
江苏省研究生科研创新计划(KYCX20-2769,KYCX20-2768,KYCX2021-053,KYCX22_3395)。