摘要
雌激素受体α亚型(ERα)被认为是乳腺癌内分泌疗法的重要靶标。本文采用随机森林、支持向量机和多元线性回归方法,对1974个化合物建立ERα抑制剂活性预测模型。利用方差过滤法和Lasso回归思想筛选分子描述符,使用均方误差MSE来评估模型的预测效果。采用随机森林、支持向量机和多元线性回归方法在训练集和测试集上的均方误差分别为0.475和0.553、0.653和0.792、0.709和0.801。结果表明,随机森林优于其他机器学习方法,用于ERα抑制剂的活性预测具有良好的稳健性和预测能力。
The alpha estrogenic receptor(ERα) is considered as an important target of endocrine therapy in breast cancer. This article used random forest, support vector machine, and multiple linear regression methods to build ER inhibitor activity prediction models for 1,974 compounds. The 50 molecular descriptors with the highest correlation with ERα inhibitor activity were screened by variance filtering and Lasso regression method, and 1974compounds were divided into training sets and test sets by 4:1. MSE was used to evaluate the prediction effect of the model. The mean square errors of random forest, support vector machine and multiple linear regression on the training set and test set are 0.475 and 0.553, 0.653 and 0.792, 0.709 and 0.801, respectively. The results show that random forest is superior to other machine learning methods and has good robustness and predictive ability for predicting the activity of ERα inhibitors.
作者
杜雪平
Du Xueping(Hubei University of Technology,School of Science,Wuhan 430068,China)
出处
《科学技术创新》
2022年第11期1-4,共4页
Scientific and Technological Innovation
关键词
ERα抑制剂
特征筛选
随机森林
支持向量机
多元线性回归
ERαinhibitor
Feature selection
Random forest
Support vector machine
Multiple linear regression