摘要
微生物群落会对所处环境的宏观性质产生重要影响,但微生物存在数据高维、复杂、稀疏的特点,为了解微生物与生态环境之间的关系提出了新的挑战。机器学习的发展以及第二代DNA测序技术应用的普及为解决这一问题提供了一种新的方法。利用308个样本共44 d的植物凋落物分解实验的土壤微生物群和溶解有机碳(dissolved organic carbon,DOC)数据,并以1709个细菌微生物操作分类单元(operational taxonomic units,OTU)作为特征构建12种常用的机器学习模型,采用嵌入法、包装法以及嵌入-包装融合法进行特征选择,并选择梯度提升决策树(gradient boosting decision tree,GBDT)作为最优模型进行参数优化。模型采用均方根误差、平均绝对误差、线性拟合优度作为评价指标。结果表明,特征筛选后降低了数据维度,提升了模型精度,在仿真实验中,嵌入-包装融合法在应用模型中表现最佳。将嵌入-包装融合法与梯度提升决策树结合构建溶解有机碳预测模型,并通过实验验证了模型的有效性。研究结果为利用细菌微生物数据应用机器学习方法估测溶解有机碳提供了新思路。
The microbial communities has an important impact on the macro nature of the environment.However,the characteristics of high-dimensional,complex and sparse microbial data also pose new challenges for understanding the relationship between microorganisms and ecological environment.The development of machine learning and the popularization of the application of the second generation DNA sequencing technology provided a new solution to this problem.In this study,soil microbiome and dissolved organic carbon(DOC)data of 308 samples from plant litter decomposition experiments for 44 days were used,and 1709 operational taxonomic units(OTU)of bacteria and microorganisms were used as features to build 12 commonly used machine learning models.Embedding method,packaging method and embedd-packaging fusion method were used for feature selection,and gradient boosting decision tree(GBDT)was selected as the optimal model for parameter optimization.The model adopted root mean square error,mean absolute error and linear goodness of fit was used as evaluation indexes.The results showed that,the feature selection reduced the data dimension and improved the model accuracy.In the simulation experiment,the embeddingpackaging fusion method performs was the best in the application model.The prediction model of dissolved organic carbon was constructed by combining the embedding and packaging fusion method with gradient boosting decision tree,and the validity of the model was verified by experiments.The results provided a new way to estimate dissolved organic carbon using machine learning method based on bacterial and microbial data.
作者
马云鹏
朱静
崔兴华
MA Yunpeng;ZHU Jing;CUI Xinghua(College of Computer and Information Engineering,Xinjiang Agricultural University,Urumqi 830052,China)
出处
《生物技术进展》
2023年第4期645-653,共9页
Current Biotechnology
基金
新疆畜牧科学院畜牧研究所基础研究项目(2020BD1002-2-2-2)。
关键词
机器学习
微生物
特征筛选
建模预测
有机碳
machine learning
microorganism
feature screening
modeling prediction
organic carbon