摘要
随着技术的不断发展,越来越多的新污染物(emerging contaminants,ECs)产生并进入环境中.这些ECs往往具有生物毒性和环境持久性,对生态环境和人类健康构成风险,因此越来越多的研究开始关注ECs的风险识别与防控.自然环境中ECs的存在情况复杂,传统的分析方法在ECs风险的识别与预测上费时费力.机器学习(machine learning,ML)是一种数据驱动的研究方法,使用已有的数据训练模型,以更好地预测和发现研究对象的发展变化规律.由于ML能够深入了解参数之间复杂的关系,近年来,ML在ECs风险识别与防控领域的应用愈加广泛.传统技术与ML结合能够降低计算成本,并通过优化和减少实验次数来节省实验时间和能源消耗.ML在ECs中的应用大致可以分为毒性预测、识别分类、性质评估与辅助去除.本文梳理了ML在典型ECs(纳米材料、微纳米塑料、抗生素抗性基因、全氟烷基和多氟烷基物质、内分泌干扰物和持久性有机污染物)中的应用情况与挑战.在毒性研究中,ML的使用能够减少动物实验,但是该领域目前普遍存在数据较少的情况,部分模型使用的数据集较小,数据质量较低,模型应用范围有限.并且并不是所有的环境问题都适用ML,没有必要过度依赖ML.复杂的ML模型通常伴随着可解释性的不足,使得研究者难以理解ML如何得出特定的预测结果,这对于解释和理解ECs的环境行为和风险是一个挑战.在ML中,除了模型算法本身,训练数据的质量决定着模型的准确性和预测的准确性.因此今后应重点提升ECs数据质量,实现数据共享,并建立统一完善的数据库,形成适合ECs研究的研究体系或框架.
As technology advances,an increasing number of emerging contaminants (ECs) are being generated and released into the environment.These ECs exhibit biological toxicity and environmental persistence,posing risks to ecological systems and human health.Consequently,many studies have focused on preventing and managing the risks associated with ECs.However,the detection and control of EC risks in natural environments through traditional analysis methods are complex,time-consuming,and laborintensive.Machine learning (ML) is a data-driven research method that utilizes existing data to train models and forecast the trends of interest.Recently,ML has been widely adopted in the identification and control of risks associated with ECs owing to its ability to comprehend intricate parameter relationships within datasets.The integration of traditional technology and ML can reduce computational costs and save experimental time and energy consumption by optimizing and reducing the number of experiments required.The application of ML in addressing ECs can be categorized into several categories,such as toxicity prediction,identification/classification,property assessment,and assisted removal.This review outlines the application of ML in addressing various types of ECs,such as nanomaterials,microplastics,antibiotic-resistant genes,perfluoroalkyl and polyfluoroalkyl substances,endocrine-disrupting chemicals,and persistent organic pollutants.ML can reduce the need for animal experiments in toxicity studies.Nonetheless,there is limited data available in this field,and certain models rely on small datasets with low data quality that limit the ML application scope.However,not all environmental problems are suitable for ML.Hence,the over-reliance on this approach is not recommended.Complex ML models often lack interpretability,making it challenging to understand how ML generates specific predictive outcomes.The absence of interpretability in complex ML poses a challenge in explaining and understanding the environmental behavior and risks associated with ECs.In addition to the intricacies within ML models,the accuracy of ML models heavily relies on the quality of the input data.Therefore,the focus should be on enhancing the quality of EC data,promoting data sharing,establishing a comprehensive and unified database,and developing suitable frameworks for managing ECs.
作者
胡献刚
王张佳
邓鹏
于福波
穆莉
王赛
周启星
HU XianGang;WANG ZhangJia;DENG Peng;YU FuBo;MU Li;WANG Sai;ZHOU QiXing(Key Laboratory of Pollution Processes and Environmental Criteria(Ministry of Education),College of Environmental Science and Engineering Nankai University,Tianjin300350,China;Key Laboratoryfor Environmental Factors Controlof Agro-Product Quality Safety(Minisry of Agriculure and Rural Affairs),Institute of Ago Environmental Protection,Ministry of Agriculture and Rural Affairs,Tianjin 300191,China;State Key Laboratory of Marine Resource Utilization in South China Sea,Hainan University,Haikou 570228,China)
出处
《中国科学:技术科学》
EI
CSCD
北大核心
2024年第10期1838-1853,共16页
Scientia Sinica(Technologica)
基金
国家重点研发计划项目(编号:2020YFC1807000)
国家自然科学基金(批准号:U22A20615)资助项目。
关键词
机器学习
新污染物
纳米材料
抗生素
微塑料
内分泌干扰物
machine learning
emerging contaminants
nanomaterials
antibiotics
microplastics
endocrine-disrupting chemicals