摘要
目的:采用生物信息学技术,从基因表达综合数据库(GEO)中挖掘食管癌(ESCA)异常表达基因,探讨该基因在食管癌中的表达及临床意义。方法:用R语言中的GEOquery包从GEO数据库中下载ESCA芯片数据集GSE38129、GSE20347,经sva包对数据集去除批次效应后,使用Limma包对标准化数据集进行差异表达基因(DEGs)筛选,利用cluster Profiler包对DEGs进行基因本体(GO)功能富集分析和京都基因与基因组百科全书(KEGG)通路富集分析,在STRING网站对DEGs进行蛋白互作网络分析(PPI),利用MCODE和Cyto Hubba插件提取核心模块及核心基因(Hub gene)。在阿拉巴马伯明翰分校癌症数据库(UALCAN)输入Hub gene分析其表达水平与食管癌分期、甲基化水平及TP53突变等的关系,最后借助芯片数据GSE70409对核心基因进行验证。结果:在标准化数据集中筛选出390个DEGs,其中上调166个,下调224个。GO分析得出,它们主要参与有丝分裂细胞周期相变、细胞外基质生成、表皮发育等生物学过程。KEGG富集分析显示DEGs与细胞周期、细胞外基质受体相互作用、阿米巴病等信号通路相关。得到的PPI输入Cytoscape软件,共筛选出20个关键基因。关键基因输入UALCAN数据库分析,筛选出3个基因(CDK1、TOP2A、AURKA)m RNA表达水平在食管癌中显著高于正常组织(P<0.05),且这3个基因与食管癌临床分期、TP53突变率、该基因启动子甲基化水平呈正相关(P<0.05)。这3个基因的表达率在男性患者中均显著高于女性患者(P<0.05),41~60岁的患者的表达率明显高于其他年龄段(P<0.05)。通过基因表达谱交互式分析(GEPIA)数据库Kaplan-Meier生存曲线分析,CDK1和AURKA基因高表达的ESCA患者总生存期比低表达者短(P=0.036和P=0.033),因此CDK1和AURKA基因表达与食管癌预后负相关。结论:综合应用生物信息学技术,最终筛选出3个核心基因,在食管癌中表达高于正常组织。CDK1、TOP2A、AURKA的表达与肿瘤分期、该基因启动子甲基化水平和TP53突变状态正相关;其中CDK1、AURKA有可能成为食管癌临床诊断和预后判断的一种新的分子标志物。
OBJECTIVE:To investigate expression and clinical significance of aberrantly expressed genes in esophageal cancers(ESCA)by mining them from the Gene Expression Omnibus(GEO)database through bioinformatics technology.METHODS:The ESCA microarray datasets GSE38129 and GSE20347 were downloaded from the GEO database using the GEOquery package in R.After the batch effect was removed from the merged dataset by the sva package,the merged dataset was screened for differentially expressed genes(DEGs)using the Limma package,and gene ontology(GO)functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway enrichment analysis were performed on the DEGs using the clusterProfiler package.Ontology(GO)functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway enrichment analysis were performed on the DEGs at the STRING website,and protein interaction network analysis(PPI)was performed on the DEGs,and core modules and core genes(hub genes)were extracted by using MCODE and CytoHubba plugins.The Hub gene was entered into the University of Alabama at Birmingham Cancer Database(UALCAN)to analyze the relationship between its expression level and esophageal cancer stage,methylation level,and TP53 mutation,etc.Finally,the core genes were verified with the help of microarray data GSE70409.RESULTS:390 DEGs were screened in the normalized merged dataset:166 up-regulated DEGs and 224 down-regulated DEGs.GO analysis showed that they were mainly involved in biological processes such as mitotic cell cycle phase transition,extracellular matrix production,and epidermal development.KEGG enrichment analysis showed that DEGs were associated with signaling pathways such as cell cycle,extracellular matrix receptor interactions,and amoebiasis.The obtained PPIs were imported into Cytotec.The obtained PPIs were input into Cytoscape software,and a total of 20 key genes were screened.The key genes were inputted into the UALCAN database for analysis,and the mRNA expression levels of three genes(CDK1,TOP2A,AURKA)were screened out to be significantly higher in esophageal cancer than in normal tissues(P<0.05),and the expression rates of these three genes were higher in cases with clinical stage of esophageal cancer,mutation of TP53,and methylation of the promoter of this gene,and the differences were all statistically significant(P<0.05).The expression rates of these three genes were significantly higher in male patients than in female patients(P<0.05),and the expression rates of these three genes were significantly higher in patients whose ages were concentrated in the 41-60 range than in other age groups(P<0.05).By Kaplan-Meier survival curve analysis of Gene Expression Profiling Interactive Analysis(GEPIA)database,the overall survival of ESCA patients with high expression of CDK1 gene was shorter than that of those with low expression(P=0.036),and the overall survival of ESCA patients with high expression of AURKA gene was shorter than that of those with low expression(P=0.033),therefore CDK1,AURKA gene expression was negatively associated with the prognosis of esophageal cancer.CONCLUSION:Using a combination of bioinformatics techniques,three core genes were identified,the expression of which was higher in esophageal cancer than in normal tissues.Expression of CDK1,TOP2A,and AURKA was positively correlated with the stage of the tumor,the level of methylation of the promoter of the genes,and the mutation status of TP53.Among them,CDK1 and AURKA had the potential to become a new molecular marker for clinical diagnosis and prognosis of esophageal cancer.
作者
赵立然
卜梁
ZHAO Liran;BU Liang(Department of Thoracic Surgery,Xiangan Hospital of Xiamen University,Xiamen 361100;School of Medicine,Xiamen University,Xiamen 361100,Fujian,China)
出处
《癌变.畸变.突变》
CAS
2023年第5期374-381,共8页
Carcinogenesis,Teratogenesis & Mutagenesis
基金
厦门市医疗卫生科技计划项目(3502Z20194045)
厦门市医疗卫生指导性项目(3502Z20214ZD1128)。