摘要
据不完全统计显示,2020年全球乳腺癌新增人数达到226万,女性乳腺癌是最常见的癌症类型,死亡率高居第五,因此对乳腺癌的治疗研究变得愈发重要。对雌激素受体α亚型(ERα)的研究显示其在乳腺发育过程中扮演重要角色。本文收集作用于ERα的化合物及其生物活性数据,并以一系列分子结构描述符作为自变量和以化合物的生物活性值作为因变量,通过随机森林与梯度提升树并融合专家知识来构建分子筛选模型,筛选出前20个对生物活性最具有显著影响的分子描述符。这些分子描述符对指导已有活性化合物的结构优化和药物研究具有重要意义。
According to incomplete statistics,the number of new breast cancers worldwide in 2020 will reach 2.26 million.Female breast cancer is the most common type of cancer,with the fifth highest mortality rate.Therefore,research on breast cancer treatment has become increasingly important.Research on the estrogen receptor alpha subtype(ERα)has shown that it plays an important role in the development of the breast.This paper collects data on the compounds and their biological activities that act on ERα,and uses a series of molecular structure descriptors as independent variables and the biological activity values of the compounds as dependent variables.Through random forest and GBDT and fusion of expert knowledge to construct the quantitative structure-activity relationship of the compound,the top 20 molecular descriptors with the most significant impact on biological activity are screened out.These molecular descriptors are of great significance for guiding the structural optimization and drug research of existing active compounds.
作者
郑剑国
ZHENG Jianguo(College of Electronics and Information Engineering,Tongji University,Shanghai,China,201804)
出处
《福建电脑》
2022年第3期1-4,共4页
Journal of Fujian Computer
关键词
乳腺癌
特征筛选
随机森林
梯度提升树
Breast Cancer
Feature Selection
Random Forest
Gradient Boosting Decision Tree