摘要
目的探究基于肠道菌群对人群未来患帕金森症(parkinson′s disease,PD)的早期预测模型进行构建与评价,并对肠道菌群宏基因KO组进行功能分析,探讨PD的潜在治疗靶点。方法基于Zenodo数据库,一方面对肠道菌群相对丰度数据进行标准分数标准化及ZicoSeq降维,采用基于自适应最小绝对收缩和选择算法(least absolute shrinkage and selection operator,LASSO)变量选择的logistic回归算法建立预测模型,使用受试者工作特征(receiver operating characteristic,ROC)曲线下面积(area under curve,AUC)和校准曲线评价模型预测效能,采用临床决策曲线(decision curve analysis,DCA)进行临床使用价值的评价;另一方面,对肠道菌群宏基因KO组数据使用limma包鉴定差异表达基因(differentially expressed genes,DEGs),对DEGs进行基因本体(gene ontology,GO)和京都基因与基因组百科全书(Kyoto encyclopedia of genes and genomes,KEGG)分析。通过结合蛋白质相互作用网络(protein-protein interaction networks,PPI)、支持向量机-递归特征消除(support vector machine-recursive feature elimination,SVM-RFE)和随机森林(random forest,RF)对DEGs进行筛选。结果基于自适应LASSO变量选择的logistic回归分析模型的ROC曲线以及校准曲线显示模型预测效果良好。DCA结果显示模型净收益较大。通过PPI网络分析及机器学习方法,最终筛选出6个核心DEGs,即阿拉伯糖转运系统渗透蛋白(L-arabinose operon Q,araQ)、甘油醛-3-磷酸脱氢酶,Ⅱ型、dCTP脱氨酶(dCTP deaminase,dcd)、颗粒19 kDa蛋白(signal recognition particle 19,SRP19)、加工前体5,核糖核酸酶P/MRP亚基(芽殖酵母)[processing of precursor 5,ribonuclease P/MRP subunit(S.cerevisiae),POP5]、肌醇-3-磷酸合酶1(inositol-3-phosphate synthase 1,ISYNA1)。结论基于自适应LASSO变量选择的logistic回归分析模型对PD的预测具有优势,从而实现对PD患者的早发现、早干预、早治疗;相关核心基因的发现为PD的治疗提供科学指导和帮助。
Objective To explore the construction and evaluation of an early prediction model for parkinson′s disease(PD)in the population based on gut microbiota to conduct functional analysis of gut microbiota macro-genus KO groups to explore potential therapeutic targets for PD.Methods Gut microbiota relative abundance data from the Zenodo database were standardized using Z-Score and dimensionality reduction was performed using ZicoSeq.An adaptive least absolute shrinkage and selection operator(LASSO)binary logistic regression algorithm was employed to establish the prediction model.The performance of the model was evaluated using the area under the receiver operating characteristic(ROC)curve and calibration curve,and clinical utility was assessed using decision curve analysis(DCA).Differential expression genes(DEGs)in gut microbiota macro-genus KO groups were identified using the limma package.Gene ontology(GO)and Kyoto encyclopedia of genes and genomes(KEGG)analyses were performed on DEGs.DEGs were further screened using protein-protein interaction networks(PPI),support vector machine recursive feature elimination(SVM-RFE),and random forest(RF).Results The ROC curve and calibration curve of the adaptive LASSO binary logistic regression model showed good predictive performance.The DCA curve showed a significant net benefit of the model.PPI network analysis and machine learning methods identified 6 core DEGs,namely L-arabinose operon Q(araQ),mitochondrial FAD-dependent glyceraldehyde-3-phosphate dehydrogenase,dcd,SRP19,POP5,and ISYNA1.Conclusions The adaptive LASSO binary logistic regression algorithm model has significant advantages in predicting PD,enabling early detection,intervention,and treatment of PD patients.The discovery of relevant core genes provides scientific guidance and assistance for the development of PD treatments.
作者
何长颖
韦雨婷
陈佳
HE Changying;WEI Yuting;CHEN Jia(Department of Military Health Statistics,Department of Military Preventive Medicine,Army Medical University,Chongqing 400038,China)
出处
《中华疾病控制杂志》
CAS
CSCD
北大核心
2024年第9期1096-1103,共8页
Chinese Journal of Disease Control & Prevention
基金
国家自然科学基金(82173621)。
关键词
帕金森病
肠道菌群
机器学习
生物信息学分析
Parkinson′s disease
Gut microbiota
Machine learning
Bioinformatic analysis