摘要
化合物的属性预测是药物研发、毒理学研究、环境行为预测等工作的核心任务.目前,人工合成的化学物质层出不穷,相关的实验研究数据在持续扩充,但实验研究数据远无法赶超新型化学物质的研发速度.近年来,机器学习算法及模型在化合物属性预测方面展现了独特的优势和巨大的潜力,尤其在实验数据匮乏的情况下,提供了可靠的模型预测数据.本文介绍了机器学习应用于化合物属性预测的主要流程步骤和相应的模块的内容,涵盖数据集、分子描述方法、模型性能评估指标和评估方法等.同时,本文系统总结了机器学习方法在化合物物理化学性质预测、生物活性预测和毒性预测方面的应用实例,并从数据集、分子特征化、模型解释等方面分析并讨论了相关研究工作现存问题与未来挑战.
Compounds property prediction is an essential task in drug development,toxicology,and environmental behavior prediction.Along with an increasing number of synthetic chemicals,the corresponding experimental research data are expanding.However,the experimental data are still far away from rapid invention of novel chemicals.In recent years,machine learning algorithms and models have shown advantages and great potential in compound property prediction,especially in case of lacking experimental data,providing reliable model-predicted data.Our study outlines the main procedures and corresponding modules related to applications of machine learning tools for compound property prediction,specifically including datasets,molecular description methods,model performance evaluation metrics,and methods.Furthermore,this work systematically summarizes progress and advances in compound property prediction based on machine learning approaches,and also introduces specific examples on compounds predictions of physical and chemical properties,bioactivity,and toxicity.To end,the existing problems and challenges are discussed based on data sets,molecular characterization,and model outcome interpretation.
作者
王紫维
韩民
金彪
WANG Ziwei;HAN Min;JIN Biao(State Key Laboratory of Organic Geochemistry,Guangzhou Institute of Geochemistry,Chinese Academy of Sciences,Guangzhou,510640,China;CAS Center for Excellence in Deep Earth Science,Guangzhou,510640,China;University of Chinese Academy of Sciences,Beijing,100049,China)
出处
《环境化学》
CAS
CSCD
北大核心
2024年第1期69-81,共13页
Environmental Chemistry
基金
国家重点研发计划重点专项(2019YFC1805500,2019YFC1805503)资助。
关键词
机器学习
化合物属性
分子结构
模型预测
machine learning
compound property
molecular structure
model prediction