摘要
近年来,统计学和机器学习方法被广泛用于分析人体肠道微生物宏基因组与代谢性疾病之间的关系,这对于微生物群落的功能注释和开发具有重要意义。本研究提出了一种新的可推广的肠道宏基因组图像增强和深度学习框架,用于人类代谢性疾病的分类预测。将3个代表性人类肠道宏基因组数据集中的每个数据样本分别转换为图像并进行数据增强,输入逻辑回归(logistic regression, LR)、支持向量机(support vector machine, SVM)、贝叶斯网络(Bayesiannetwork,BN)和随机森林(randomforest,RF)机器学习模型以及多层感知机(muti-layer perception, MLP)和卷积神经网络(convolutional neural network, CNN)深度学习模型。使用准确率(accuracy, A)、精确率(precession, P)、召回率(recall, R)、F1分数(F1-score)和ROC(receiver operating characteristic)曲线下面积(area under the curve, AUC)5个指标以及10折交叉验证整体评估模型疾病预测的精度性能。结果显示:MLP模型的整体表现优于CNN、LR、SVM、BN、RF以及PopPhy-CNN方法,且经过数据增强(随机旋转和添加椒盐噪声)后,MLP和CNN的模型性能均有进一步提升。MLP模型进行疾病预测的准确率进一步提高了4%~11%,F1提高了1%~6%,AUC提高了5%~10%。以上结果表明,人类肠道宏基因组图像增强和深度学习可以准确地提取微生物群特征,有效预测宿主疾病表型。本研究中使用的源代码和数据集均公开发表在Github中:https://github.com/HuaXWu/GM_ML_Classification.git。
In recent years,statistics and machine learning methods have been widely used to analyze the relationship between human gut microbial metagenome and metabolic diseases,which is of great significance for the functional annotation and development of microbial communities.In this study,we proposed a new and scalable framework for image enhancement and deep learning of gut metagenome,which could be used in the classification of human metabolic diseases.Each data sample in three representative human gut metagenome datasets was transformed into image and enhanced,and put into the machine learning models of logistic regression(LR),support vector machine(SVM),Bayesian network(BN)and random forest(RF),and the deep learning models of multilayer perceptron(MLP)and convolutional neural network(CNN).The accuracy performance of the overall evaluation model for disease prediction was verified by accuracy(A),accuracy(P),recall(R),F1 score(F1),area under ROC curve(AUC)and 10 fold cross-validation.The results showed that the overall performance of MLP model was better than that of CNN,LR,SVM,BN,RF and PopPhy-CNN,and the performance of MLP and CNN models was further improved after data enhancement(random rotation and adding salt-and-pepper noise).The accuracy of MLP model in disease prediction was further improved by 4%-11%,F1 by 1%-6%and AUC by 5%-10%.The above results showed that human gut metagenome image enhancement and deep learning could accurately extract microbial characteristics and effectively predict the host disease phenotype.The source code and datasets used in this study can be publicly accessed in https://github.com/HuaXWu/GM_ML_Classification.git.
作者
郑慧怡
吴华煊
杜志强
Huiyi Zheng;Huaxuan Wu;Zhiqiang Du(College of Animal Science and Technology,Yangtze University,Jingzhou 434025,China)
出处
《遗传》
CAS
CSCD
北大核心
2024年第10期886-896,共11页
Hereditas(Beijing)
基金
安徽省畜禽联合育种改良项目(2021-2025)。
关键词
肠道宏基因组
数据增强
机器学习
深度学习
疾病预测
gut metagenome
data enhancement
machine learning
deep learning
disease prediction