摘要
目的:通过生物信息学方法整合肝癌转录组和甲基化组数据,利用Lasso回归分析筛选肝癌特异性标志物,并构建肿瘤预测模型。方法:在GEO(gene expression omnibus)中下载GSE70091、GSE77314数据集,共53例肝癌患者的转录组测序(RNA-seq)和全基因组甲基化测序(WGBS)数据,分别将肝癌与癌旁对照间的转录组、甲基化组数据进行差异分析,并对差异表达基因(DEG)和差异甲基化基因(DMG)进行整合,以筛选出候选肝癌标志基因。对候选肝癌标志基因进行GO(gene ontology)和Reactome通路富集分析,使用Lasso回归分析筛选标志基因,构建肝癌预测模型,并在其他队列中进行性能验证。结果:共筛选出288个DEG(|log2FC|>1,P.adj<0.05)和28528个DMG(P<0.05),通过DEG和DMG的交叉分析找到51个高甲基化下调(Hyper-Down)基因和111个低甲基化上调(Hypo-Up)基因。GO和Reactome通路富集分析显示,Hypo-Up基因主要富集在细胞有丝分裂通路上(FDR<0.05,P<0.05),Hyper-Down基因主要与转录激活的功能相关(FDR<0.05,P<0.05)。使用Lasso回归分析筛选出11个具有非零系数的基因并构建肝癌预测模型。最后在GSE77314、外部验证队列(TCGA-LIHC)中验证出模型曲线下面积(AUC)分别为1、0.998。结论:通过整合肝癌多组学数据,使用Lasso回归分析筛选出11个基因标志物,并构建肝癌预测模型。
Objective:To integrate hepatocellular carcinoma transcriptomic and methylationomic data by bioinformatics methods,screen hepatocellular carcinoma-specific markers and construct tumor prediction models using Lasso regression analysis.Methods:The transcriptome sequencing(RNA-seq)and whole genome methylation sequencing(WGBS)data of a total of 53 hepatocellular carcinoma patients from the GSE70091 and GSE77314 datasets were downloaded from the GEO(gene expression omnibus)database,and the transcriptome and methylation data between hepatocellular carcinomas and adjacent normal tissues were analyzed for differences,respectively.Differentially expressed genes(DEGs)and differentially methylated genes(DMGs)were integrated to screen out hepatocellular carcinoma candidate marker genes.The GO(gene ontology)and Reactome pathway enrichment analyses were performed on the candidate marker genes,and Lasso regression analysis was used to screen the marker genes and construct the liver cancer prediction model,and the performance was validated in other cohorts.Results:Totally,288 DEGs(|log2FC|>1,P.adj<0.05)and 28528 DMGs(P<0.05)were screened out,and 51 Hyper-Down genes and 111 Hypo-Up genes were identified by cross-analysis of DEG and DMG.GO and Reactome results showed that Hypo-Up genes were enriched in the cellular mitotic pathway(FDR<0.05,P<0.05)and Hyper-Down genes were mainly associated with transcriptional activation functions(FDR<0.05,P<0.05).A liver cancer prediction model was created by screening 11 genes with non-zero coefficients using Lasso regression analysis.Finally,in GSE77314 and TCGA-LIHC cohorts,the model′s area under curve(AUC)was validated as 1 and 0.998,respectively.Conclusion:A total of 11 gene markers and construct a hepatocellular carcinoma prediction model by are screened integrating the multi-omics data of hepatocellular carcinoma and using Lasso regression analysis.
作者
罗焱瑞
赵倩
LUO Yanrui;ZHAO Qian(Department of Cell Biology,School of Basic Medical Sciences,Tianjin Medical University,Tianjin 300070,China)
出处
《天津医科大学学报》
2024年第3期205-211,254,共8页
Journal of Tianjin Medical University
关键词
肝癌
差异表达基因
差异甲基化基因
Lasso回归分析
liver cancer
differentially expressed genes
differentially methylated genes
Lasso regression analysis