摘要
随着高通量测序技术的发展,宏基因组数据库得到了极大的丰富,为利用其分析人类疾病与健康状况提供了可能,其中基于人类肠道微生物组分析的疾病预测成为了代表性研究方向之一。本文利用以门为单位的分类学肠道微生物数据,即操作分类单元数据,结合非负矩阵分解和变分自动编码器方法,提出了两类新的机器学习分类算法,这些算法旨在提取肠道微生物中的关键信息,以实现对疾病患者的预测。通过降维、数据生成以及引入惩罚约束项等技术手段,我们改善了预测效果、优化了模型的过拟合。在模拟数据、肝硬化数据和糖尿病数据上,我们的预测模型均表现出了较好的性能,AUC值分别达到了0.926、0.956和0.745。
With the advancements in high-throughput sequencing technologies, the macro-genomic databases have significantly expanded, offering possibilities for analyzing human health and diseases. Among these possibilities, disease prediction based on the analysis of the human gut microbiota has be-come a prominent research avenue. In this study, we utilized taxonomic gut microbiota data at the phylum level, known as Operational Taxonomic Units (OTU) data, and introduced two novel ma-chine learning classification algorithms by combining non-negative matrix factorization and varia-tional autoencoder methods. These algorithms are designed to extract critical information from the gut microbiota to predict diseases in patients. Through techniques such as dimensionality reduc-tion, data generation, and the incorporation of penalty constraints in the models, we improve the prediction effect and optimize the overfitting of the model. Across simulated data, liver cirrhosis data, and diabetes data, our predictive models demonstrated significant performance, achieving AUC values of 0.926, 0.959, and 0.745, respectively.
出处
《应用数学进展》
2024年第1期199-207,共9页
Advances in Applied Mathematics