摘要
目的建立基于GEO数据库硬皮病线粒体相关基因的机器学习和人工神经网络联合诊断模型并评价其效果。方法通过GEO数据库获取3份硬皮病芯片。其中GSE95065及GSE59785合并作为实验数据集并提取线粒体相关基因表达量,使用随机森林、LASSO回归和SVM算法筛选硬皮病线粒体相关特征基因,并用特征基因构建人工神经网络模型,用10折交叉验证模型准确性。来用验证数据集GSE76807对模型进一步验证,利用ROC曲线下面积值评估模型准确性。用RT-qPCR实验验证关键基因mRNA相对表达量。最后用CIBERSORT算法预估硬皮病与筛选出的潜在生物标志物的生物信息学关联。结果共获取差异基因24个,其中上调基因11个,下调基因13个。通过3种机器学习算法筛选到最相关的7个线粒体相关特征基因(POLB、GSR、KRAS、NT5DC2、NOX4、IGF1、TGM2),并构建人工神经网络诊断模型。使用该模型绘制了实验组和验证组诊断的ROC曲线,AUC值为0.984。验证组AUC为0.740。10折交叉验证AUC平均值大于0.980。RT-qPCR结果显示,与对照组相比,硬皮病中POLB(P=0.004)、GSR(P=0.029)、KRAS(P=0.007)、NOX4(P=0.019)、IGF1(P=0.008)、TGM2(P<0.0001)表达量明显上调,而NT5DC2(P=0.001)表达量在硬皮病组中明显下调。免疫细胞浸润显示,特征基因与滤泡辅助T细胞、幼稚B细胞、静息树突状细胞、记忆激活CD4+T细胞、巨噬细胞M0、单核细胞、记忆静息CD4+T细胞和肥大细胞激活等相关。结论构建了硬皮病特征基因的人工神经网络诊断模型,为探索硬皮病发病机制提供了一个新视角。
Objective To establish a diagnostic model for scleroderma by combining machine learning and artificial neural network based on mitochondria-related genes.Methods The GSE95065 and GSE59785 datasets of scleroderma from GEO database were used for analyzing expressions of mitochondria-related genes,and the differential genes were identified by Random forest,LASSO regression and SVM algorithms.Based on these differential genes,an artificial neural network model was constructed,and its diagnostic accuracy was evaluated by 10-fold crossover verification and ROC curve analysis using the verification dataset GSE76807.The mRNA expressions of the key genes were verified by RT-qPCR in a mouse model of scleroderma.The CIBERSORT algorithm was used to estimate the bioinformatic association between scleroderma and the screened biomarkers.Results A total of 24 differential genes were obtained,including 11 up-regulated and 13 down-regulated genes.Seven most relevant mitochondria-related genes(POLB,GSR,KRAS,NT5DC2,NOX4,IGF1,and TGM2)were screened using 3 machine learning algorithms,and the artificial neural network diagnostic model was constructed.The model showed an area under the ROC curves of 0.984 for scleroderma diagnosis(0.740 for the verification dataset and 0.980 for cross-over validation).RT-qPCR detected significant up-regulation of POLB,GSR,KRAS,NOX4,IGF1 and TGM2 mRNAs and significant down-regulation of NT5DC2 in the mouse models of scleroderma.Immune cell infiltration analysis showed that the differential genes in scleroderma were associated with follicular helper T cells,immature B cells,resting dendritic cells,memory activated CD4+T cells,M0 macrophages,monocytes,resting memory CD4+T cells and mast cell activation.Conclusion The artificial neural network diagnostic model for scleroderma established in this study provides a new perspective for exploring the pathogenesis of scleroderma.
作者
左志威
孟庆良
崔家康
郭克磊
卞华
ZUO Zhiwei;MENG Qingliang;CUI Jiakang;GUO Kelei;BIAN Hua(School of Orthopedics and Traumatology,Henan University of Traditional Chinese Medicine,Department of Rheumatology//Henan Provincial Hospital of Traditional Chinese Medicine,Zhengzhou 450008,China;Henan Key Laboratory of Zhang Zhongjing Formulae and Herbs for Immunoregulation,Nanyang Institute of Technology,Nanyang 473004,China)
出处
《南方医科大学学报》
CAS
CSCD
北大核心
2024年第5期920-929,共10页
Journal of Southern Medical University
基金
国家自然科学基金(82074415)
中原英才计划-中原科技创新领军人才项目(234200510006)
河南省科技计划项目(232102311201)。
关键词
硬皮病
线粒体
人工神经网络
免疫细胞浸润
机器学习
scleroderma
mitochondria
artificial neural network
immune cell infiltration
machine learning