摘要
目的运用生物信息学与机器学习的方法筛选与肺鳞癌的恶性进展和预后相关的潜在生物标志物,为肺鳞癌分子机制研究提供基础。方法在基因表达综合数据库(GEO)和癌症和肿瘤基因图谱(TCGA)数据库下载肺鳞癌和癌旁的测序数据集和临床特征数据(RNA测序数据对应于542名肺鳞癌患者的493份肿瘤样本和49份正常组织样本),利用R分析差异表达基因,使用蛋白互作网络分析数据库(STRING)进行蛋白质-蛋白质相互作用(PPI)网络分析和加权基因共表达网络分析(WGCNA),筛出关键基因,随后结合套索回归算法(LASSO-cox)构建肺鳞癌预后模型,进行生存分析筛选与肺鳞癌生存期相关的中心基因。结果TCGA数据集中包括49个正常样本和493个肺鳞癌样本,筛出2966个上调基因和2760个下调基因;GEO数据库GSE87410和GSE158420数据集,发现927个明显上调的差异基因和734个明显下调的差异基因;交集部分包括516个基因。通过WGCNA共获得2个中心模块(P<0.05):第一个模块中57个基因在数据集中均上调;第二个模块中FETUB和HRG基因下调,其余均上调。基因本体论(GO)富集分析显示,模块中的基因主要与纺锤体组织、有丝分裂、核分裂、姐妹染色体分离等相关功能紧密相关(P均<0.05)。根据TCGA中542个肺鳞癌肿瘤样本的表达数据和临床信息,通过乘积极限法(Kaplan-Meier)生存分析,得出驱动蛋白家族成员15(KIF15)、纤维蛋白原γ链(FGG),载脂蛋白H(APOH)与预后相关(P均<0.05),相对于正常组织,KIF15、FGG、APOH在肿瘤组织中均上调。TCGA中的临床数据按照1∶1划分为训练集(n=271,P<0.05)和测试集(n=271,P<0.05),结合LASSO-cox算法构建出2基因即FGG(HR=1.076,95%CI:1.042~1.112,P<0.001)和FOSB(HR=1.117,95%CI:1.060~1.176,P<0.001)可用于预后风险分数模型,公式为Riskscore=FGG×0.0537388696259006+FOSB×0.0655548508563815,在训练数据集和测试数据集中,低风险组的预后显著好于高风险组,其中测试集1、3、5年生存的预后模型的曲线下面积分别为0.62、0.61和0.59(P<0.05)。结论WGCNA综合LASSO-cox回归分析和基础实验验证发现FGG在肺鳞癌细胞中高表达,其高表达预示肺鳞癌患者预后不佳,FGG是肺鳞癌可能的预后生物分子标志物。
Objective To screen potential biomarkers associated with the malignant progression and prognosis of lung squamous carcinoma using bioinformatics and machine learning methods,and provide a basis for the study of the molecular mechanism of lung squamous carcinoma.Methods Sequencing datasets and clinical characterization data of lung squamous and paraneoplastic cancers were downloaded from Gene Expression Omnibus(GEO)and The Cancer Genome Atlas(TCGA)databases(RNA sequencing data corresponded to 493 tumor samples and 49 normal tissue samples from 542 patients with lung squamous carcinoma),and were analyzed for differentially expressed genes using R analysis of differentially expressed genes;protein-protein interaction(PPI)network analysis and weighted gene co-expression network analysis(WGCNA)using the Protein Interaction Network Analysis Database(STRING)were used to screen for the key genes,followed by combining with the lasso regression algorithm(LASSO-cox)to construct the prognostic model of lung squamous carcinoma,and survival analysis was used to screen for the central genes that were associated with the survival of lung squamous carcinoma.Results A total of 2966 up-regulated genes and 2760 down-regulated genes were identified from the TCGA dataset included 49 normal samples and 493 lung squamous carcinoma samples;927 significantly up-regulated and 734 significantly down-regulated differential genes were identified from the GEO databases,GSE87410 and GSE158420 datasets,and 516 genes were in common.A total of 2 central modules were obtained by WGCNA(P<0.05),57 genes in the first module were up-regulated in the dataset;FETUB and HRG genes were down-regulated and the rest were up-regulated in the second module.Gene ontology(GO)enrichment analysis showed that the genes in the modules were mainly tightly associated with functions related to spindle organization,mitosis,karyokinesis,sister chromatid segregation,etc(all P<0.05).Based on the expression data and clinical information of 542 lung squamous carcinoma tumor samples in TCGA,it was concluded that kinesin family member 15(KIF15),fibrinogen gamma chain(FGG),apolipoprotein H(APOH)were correlated with prognosis by multiplicative limit method(Kaplan-Meier)survival analysis(all P<0.05),and that KIF15,FGG and APOH were up-regulated in tumor tissues relative to normal tissues.Clinical data in TCGA were divided into training set(n=271,P<0.05)and test set(n=271,P<0.05)according to 1∶1,and combined with the LASSO-cox algorithm to construct 2 genes i.e.FGG(HR=1.076,95%CI:1.042-1.112,P<0.001)and FOSB(HR=1.117,95%CI:1.060-1.176,P<0.001)could be used in the prognostic risk score model with the formula Risk score=FGG×0.0537388696259006+FOSB×0.0655548508563815,and the prognosis of the low-risk group was significantly better than that of the high-risk group in the training dataset and test dataset,with the test set of the 1-,3-and 5-year survival area under curve of the prognostic model were 0.62,0.61 and 0.59(P<0.05),respectively.Conclusion WGCNA integrated LASSO-cox regression analysis and basic experimental validation found that FGG was highly expressed in lung squamous carcinoma cells,and its high expression could predict poor prognosis of patients with squamous carcinoma,and FGG was a possible prognostic biomolecule marker for lung squamous carcinoma.
作者
徐家文
李明
XU Jiawen;LI Ming(Department of Thoracic Surgery,the Affiliated Cancer Hospital of Nanjing Medical University&Jiangsu Cancer Hospital&Jiangsu Institute of Cancer Research,Jiangsu Key Laboratory of Molecular and Translational Cancer Research,Collaborative Innovation Center for Cancer Personalized Medicine,Nanjing,Jiangsu,210009,China)
出处
《热带医学杂志》
CAS
2023年第12期1671-1677,1794,共8页
Journal of Tropical Medicine
基金
江苏省卫生健康委员会科研项目(ZD2022027)
南京市科技计划项目(2022SX00000446)