期刊文献+

双相障碍外周血RNA测序数据的随机森林与前馈神经网络联合判别模型的构建与分析

Construction and analysis of a combined discriminative model of random forest and feedforward neural network for peripheral blood RNA sequencing data in bipolar disorder
原文传递
导出
摘要 目的通过随机森林方法寻找双相障碍特征基因,并利用神经网络方法构建双相障碍诊断判别模型。方法选用GSE23848数据集中双相障碍(n=20)与健康对照者(n=15)的基因表达数据,使用阴性对照探针进行背景校正,并使用阴性和阳性对照探针进行标准化,线性模型分析和经验贝叶斯统计方法识别差异表达基因。构建随机森林模型对差异表达基因进行特征提取及构建神经网络模型,在独立外部GSE39653数据集[双相障碍患者(n=8)和健康对照者(n=24)]中验证模型的判别效能。通过基因本体论(gene ontology,GO)、蛋白质-蛋白质相互作用网络(protein-protein interaction networks,PPI)等方法对特征基因进行生物学功能探索。结果共筛选出1330个与双相障碍相关的差异表达基因以及35个特征基因进行模型构建。最终得到了一个包含4个隐藏层与4个丢弃层,具有50433个可训练参数的前馈神经网络模型。在外部验证集中使用自助法(bootstrap)经过1000次重复抽样计算其敏感度、特异度、受试者工作特征(receiver operation characteristic,ROC)曲线下面积、准确度的可信区间均为1,在GSE39653外部验证集中,模型的ROC曲线下面积(area under curve,AUC)值为0.72。对特征基因的富集分析表明模型中基因的功能与线粒体结构及能量代谢相关。结论随机森林方法可以识别双相障碍的特征基因,随机森林与前馈神经网络联合建立的诊断模型在双相障碍中具有较好的分类性能。 Objective To identify characteristic genes of bipolar disorder using the random forest method and to construct a discriminative model for bipolar disorder using neural network approaches.Methods The study utilized gene expression data from individuals with bipolar disorder(n=20)and healthy controls(n=15)from the GSE23848 dataset.Background correction was performed using negative control probes,and normalization was done with both negative and positive control probes.Differentially expressed genes were identified through linear model analysis and empirical Bayesian statistical methods.A random forest model was built for feature extraction of differentially expressed genes,and a neural network model was constructed using the characteristic genes identified by the random forest model.The discriminative efficiency of the model was validated on an independent external dataset GSE39653,which included bipolar disorder patients(n=8)and healthy controls(n=24).Biological functions of the characteristic genes were explored through gene ontology(GO)and protein-protein interaction networks(PPI).Results A total of 1330 differentially expressed genes related to bipolar disorder and 35 characteristic genes were selected for model construction.The final model was a feedforward neural network with four hidden layers and four dropout layers,possessing 50433 trainable parameters.Bootstrap methods with 1000 resampling were used to calculate the confidence intervals for sensitivity,specificity,area under the receiver operating characteristic curve(AUC),and accuracy,all of which were 1.In the GSE39653 external validation set,the model′s AUC was 0.72.Enrichment analysis of the characteristic genes suggested that the functions of the genes in the model are related to mitochondrial structure and energy metabolism.Conclusion The random forest method can identify characteristic genes of bipolar disorder,and a diagnostic model established through the combination of random forests and feedforward neural networks shows good classification performance in bipolar disorder.
作者 王相文 冯顺康 陈红 王圣海 孙平 Wang Xiangwen;Feng Shunkang;Chen Hong;Wang Shenghai;Sun Ping(Qingdao Mental Health Center,Qingdao 266034,China)
出处 《中华精神科杂志》 CAS CSCD 北大核心 2024年第4期213-220,共8页 Chinese Journal of Psychiatry
基金 山东省医药卫生科技发展计划项目(202203090255)。
关键词 双相情感障碍 基因表达谱 神经网络(计算机) Bipolar disorder Gene expression profiling Neural networks(Computer)
  • 相关文献

参考文献1

共引文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部