期刊文献+

环境微生物研究中机器学习算法及应用 被引量:2

A review of machine learning algorithms for environmental microbiology
原文传递
导出
摘要 微生物在环境中无处不在,它们不仅是生物地球化学循环和环境演化的关键参与者,也在环境监测、生态治理和保护中发挥着重要作用。随着高通量技术的发展,大量微生物数据产生,运用机器学习对环境微生物大数据进行建模和分析,在微生物标志物识别、污染物预测和环境质量预测等领域的科学研究和社会应用方面均具有重要意义。机器学习可分为监督学习和无监督学习2大类。在微生物组学研究当中,无监督学习通过聚类、降维等方法高效地学习输入数据的特征,进而对微生物数据进行整合和归类。监督学习运用有特征和标记的微生物数据集训练模型,在面对只有特征没有标记的数据时可以判断出标记,从而实现对新数据的分类、识别和预测。然而,复杂的机器学习算法通常以牺牲可解释性为代价来重点关注模型预测的准确性。机器学习模型通常可以看作预测特定结果的“黑匣子”,即对模型如何得出预测所知甚少。为了将机器学习更多地运用于微生物组学研究、提高我们提取有价值的微生物信息的能力,深入了解机器学习算法、提高模型的可解释性尤为重要。本文主要介绍在环境微生物领域常用的机器学习算法和基于微生物组数据的机器学习模型的构建步骤,包括特征选择、算法选择、模型构建和评估等,并对各种机器学习模型在环境微生物领域的应用进行综述,深入探究微生物组与周围环境之间的关联,探讨提高模型可解释性的方法,并为未来环境监测、环境健康预测提供科学参考。 Ubiquitous microorganisms,key players in biogeochemical cycles and environmental evolution,are involved in environmental monitoring as well as ecological governance and protection.The booming high-throughput technologies have generated massive microbial data and expanded the scope of microbiome research.Constructing machine learning models to analyze complex microbial data is of great importance to microbial marker identification,pollutant prediction,and environmental quality prediction.Machine learning algorithms can be classified into two categories:supervised learning and unsupervised learning.In microbiome research,unsupervised learning grasps the characteristics of input data through clustering and dimensionality reductions,enabling the integration and classification of microbial data.Supervised learning uses microbial datasets with features and labels to train and build models that can be used to classify,identify,and predict new data without labels.However,sophisticated machine learning algorithms often focus on the accuracy of model predictions at the expense of interpretability.Machine learning models can often be regarded as a“black box”that predicts a specific outcome.Little is known about how the prediction is obtained by the model.Improving model interpretability is critical for the accurate application of machine learning and the extraction of valuable biological information in microbiome research.This review introduced the machine learning algorithms commonly used in environmental microbiology and the construction steps(including feature selection,algorithm selection,model construction and evaluation)of machine learning models based on microbiome data.Furthermore,we summarized several application scenarios of machine learning models in environmental microbiology for in-depth exploration of the relationship between the microbiome and the surrounding environment,attempting to improve the interpretability of the model and provide a reference for future environmental monitoring and environmental health prediction.
作者 陈鹤 陶晔 毛振镀 邢鹏 CHEN He;TAO Ye;MAO Zhendu;XING Peng(State Key Laboratory of Lake Science and Environment,Nanjing Institute of Geography and Limnology,Chinese Academy of Sciences,Nanjing 210008,Jiangsu,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《微生物学报》 CAS CSCD 北大核心 2022年第12期4646-4662,共17页 Acta Microbiologica Sinica
基金 国家自然科学基金(91751111,31670505,31722008) 江苏省自然科学基金(BK20220015) 中国科学院青年创新促进会(2014273)。
关键词 机器学习 微生物组 环境微生物 16S rRNA基因 宏基因组 machine learning microbiome environmental microorganisms 16S rRNA gene metagenome
  • 相关文献

参考文献1

同被引文献19

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部