期刊文献+

加权共表达网络分析与机器学习识别类风湿关节炎滑膜中的关键基因

Weighted gene co-expression network analysis and machine learning identification of key genes in rheumatoid arthritis synovium
下载PDF
导出
摘要 背景:类风湿关节炎是一种全身的免疫相关性疾病,主要病理特点是关节滑膜炎性增生及关节软骨的破坏,其发病机制目前尚不明确,迫切需要发现新的具有高度敏感性和特异性的诊断标志物。目的:联合使用生物信息学技术及计算机学习算法,识别并筛选类风湿关节炎患者滑膜中的关键基因,构建类风湿关节炎预测模型并进行验证。方法:从基因表达综合数据库中下载3个包含类风湿关节炎患者滑膜的数据集(GSE77298、GSE55235、GSE55457),GSE77298和GSE55235作为训练集,GSE55457作为测试集,共纳入66个样本,其中类风湿关节炎患者滑膜样本39个,正常滑膜样本27个。应用R语言筛选训练集中的差异基因,然后使用加权共表达网络将训练集中的基因模块化,选出关键模块中的特征基因,将差异表达基因和特征基因取交集,交集基因进入下一步机器学习。采用3种机器学习方法:最小绝对值收敛和选择算子算法、支持向量机-递归特征消除和随机森林算法对交集基因进一步分析获得枢纽基因,将枢纽基因再次相交即得到类风湿关节炎滑膜中的关键基因。以关键基因为变量构建预测类风湿关节炎的列线图模型,推测患者发生类风湿关节炎的危险程度,使用受试者工作特征曲线确定类风湿关节炎预测模型及其关键基因的诊断价值。结果与结论:①通过差异分析,训练集中共筛选出差异基因730个,加权共表达网络分析得到特征基因185个,两者交集基因159个;②最小绝对值收敛和选择算子发现枢纽基因4个,支持向量机-递归特征消除发现枢纽基因11个,随机森林发现枢纽基因5个,取交集后获得关键基因2个(TNS3、SDC1);③基于2个关键基因,在训练集及测试集种构建列线图,其校准预测曲线与标准曲线贴合较好,且预测类风湿关节炎发生的临床效能良好;④上述结果证实,基于生物信息及机器学习算法获得的TNS3和SDC1有可能成为类风湿关节炎诊断和治疗的关键靶点。 BACKGROUND:Rheumatoid arthritis is a condition that affects the entire immune system in the body and is known for causing inflammatory hyperplasia in the joints and destruction of articular cartilage.The pathogenesis of rheumatoid arthritis is still unclear;therefore,there is an urgent need to discover new highly sensitive and specific diagnostic biomarkers.OBJECTIVE:To identify and screen key genes in the synovium of rheumatoid arthritis patients using bioinformatics techniques and machine learning algorithms and to construct and validate a rheumatoid arthritis prediction model.METHODS:Three datasets containing synovial tissue samples from rheumatoid arthritis patients(GSE77298,GSE55235,GSE55457)were downloaded from the Gene Expression Omnibus(GEO)database.GSE77298 and GSE55235 were used as the training set,while GSE55457 served as the test set,with a total of 66 samples,including 39 samples from rheumatoid arthritis patients and 27 normal synovial samples.Differentially expressed genes in the training set were selected using R language,and then the weighted gene co-expression network analysis was used to modularize the genes in the training set.The most relevant module was selected,and feature genes within this module were identified.Differentially expressed genes and the feature genes from the module were intersected for the subsequent machine learning analysis.Three machine learning methods,namely the least absolute shrinkage and selection operator algorithm,support vector machine with recursive feature elimination,and random forest algorithm,were employed to further analyze the intersected genes and identify the hub genes.The hub genes obtained from these three machine learning algorithms were intersected again to obtain the key genes in the synovium of rheumatoid arthritis.A predictive rheumatoid arthritis model was constructed using these key genes as variables,and the risk of developing rheumatoid arthritis in patients was inferred based on the model.The receiver operating characteristic curve was used to determine the diagnostic value of the rheumatoid arthritis prediction model and its key genes.RESULTS AND CONCLUSION:Through the differential analysis,a total of 730 differentially expressed genes were identified in the training set,and 185 feature genes were identified in the weighted gene co-expression network analysis feature modules.There were 159 intersected genes obtained.There were 4 hub genes identified by the least absolute shrinkage and selection operator algorithm,11 hub genes by the support vector machine with recursive feature elimination algorithm,and 5 hub genes by the random forest algorithm.After intersection,2 key genes(TNS3 and SDC1)were obtained.Based on the two key genes,a nomogram model was constructed in the training and test sets,with good fit between the calibration prediction curve and the standard curve,and good clinical efficacy in predicting the onset of rheumatoid arthritis.These findings indicate that TNS3 and SDC1,obtained based on bioinformatics and machine learning algorithms,may become key targets for the diagnosis and treatment of rheumatoid arthritis.
作者 武英楷 史高龙 谢宗刚 Wu Yingkai;Shi Gaolong;Xie Zonggang(The Second Affiliated Hospital of Soochow University,Suzhou 215000,Jiangsu Province,China;The First People’s Hospital of Ningyang County,Taian 271000,Shandong Province,China)
出处 《中国组织工程研究》 CAS 北大核心 2025年第2期294-301,共8页 Chinese Journal of Tissue Engineering Research
关键词 加权基因共表达网络 机器学习算法 类风湿关节炎 关键基因 预测模型 weighted gene co-expression network machine learning algorithm rheumatoid arthritis key gene prediction model
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部