摘要
目的筛选基于肺腺癌(LUAD)预后相关的炎症反应关键基因,并基于该基因构建预后预测模型。方法在TCGA数据库中下载肺腺癌组织数据作为训练集,在GTEx数据库中下载正常肺组织数据作为训练集的正常对照,筛选差异表达基因(DEG);在分子特征数据库中下载炎症反应相关基因列表,采用单变量COX回归分析其中与预后相关的炎症反应相关基因,与DEG取交集得到与LUAD预后相关的炎症反应相关基因,应用LASSO回归和随机生存森林(RSF)算法筛选与LUAD预后相关的炎症反应关键基因,并建立预后风险评分公式。使用训练集进行内部验证,从GEO数据库中下载LUAD数据作为验证集进行外部验证,绘制该预后风险评分预测患者1年、3年和5年生存率的受试者工作特征(ROC)曲线,根据cut-off值分为高、低风险组,比较其总生存期(OS)。单因素及多因素COX回归分析风险评分与训练集和验证集OS的关系,整合所有独立的预后相关因素,构建预测训练集患者1年、3年和5年生存率的列线图。结果LUAD组织和正常肺组织的DEG共48个,与预后相关的炎症反应相关基因共50个,取交集后获得与LUAD预后相关的炎症反应相关基因共11个,LASSO回归和RSF算法筛选得到9个关键基因,即肾上腺髓质素(ADM)、LCCL结构域蛋白2(DCBLD2)、白细胞介素7受体(IL-7R)、MAX二聚化蛋白1(MXD1)、神经介素U受体1(NMUR1)、原钙粘蛋白(PCDH7)、磷酸肌醇3激酶调节亚基5(PIK3R5)、清道夫受体F类成员1(SCARF1)、人纤溶酶原激活物抑制剂1(SERPINE1)。内外部验证结果显示,该预后风险评分预测训练集患者1年、3年和5年生存率的曲线下面积(AUC)分别为0.73、0.64和0.68,预测验证集患者1年、3年和5年生存率的AUC分别为0.578、0.602和0.581,高风险组OS均短于低风险组(P均<0.01)。单因素及多因素COX回归分析结果显示,预后风险评分是训练集和验证集患者OS的独立影响因素(训练集HR=2.99、P<0.01,验证集HR=2.47、P<0.01)。构建包含所有独立预后相关因素(年龄、性别、肿瘤分期、预后风险评分)的列线图,其总体的一致性指数为0.710,模型预测精度高,校准曲线和标准曲线的重合度较好。结论筛选出LUAD预后相关的炎症反应关键基因9个,即ADM、DCBLD2、IL-7R、MXD1、NMUR1、PCDH7、PIK3R5、SCARF1、SERPINE1,基于上述炎症反应关键基因的预后风险模型有助于判断LUAD患者的预后。
Objective To screen the key genes of inflammatory response related to prognosis in lung adenocarcinoma(LUAD),and to construct a prognostic prediction model based on these genes.Methods LUAD tissue data were downloaded from the TCGA database as a training set,and normal lung tissue data were downloaded from the GTEx database as a normal control for the training set to screen for differentially expressed genes(DEGs).A list of inflammatory response-related genes was downloaded from the Molecular Characteristics Database.Univariate COX regression was used to analyze the genes that were associated with prognosis.The intersection with DEGs was then taken to obtain the inflammatory response-related genes that were associated with prognosis in LUAD.LASSO regression and Random Survival Forest(RSF)algorithms were applied to screen key genes for prognosis-related inflammatory response in LUAD and to establish a model for prognostic risk.Internal validation was conducted using the training set,while external validation was carried out by obtaining the LUAD data from the GEO database as the validation set.Receiver operating characteristic(ROC)curve of this model in predicting the 1-year,3-year,and 5-year survival rates of patients was drawn.We divided the pa‐tients into high-risk and low-risk groups based on cut-off values and compared their overall survival(OS).Univariate and multifactorial COX regression analyses were conducted to examine the association between risk scores and OS in patients from both the training and validation sets.We incorporated all independent prognostic factors and constructed a nomogram to predict 1-,3-,and 5-year survival in patients from the training set.Results There were 48 DEGs between the LUAD tissues and normal lung tissues.Additionally,50 genes for inflammatory response and prognosis were identified,and 11 genes for prognosis-related inflammatory response in LUAD were obtained after taking the intersection.Nine key genes were identified through LASSO regression and RSF algorithm screening:adrenomedullin(ADM),LCCL domain-contain‐ing protein 2(DCBLD2),interleukin 7 receptor(IL-7R),MAX dimerization protein 1(MXD1),Neuromedin-U receptor 1(NMUR1),protocadherin 7(PCDH7),phosphoinositide-3-kinase regulatory subunit 5(PIK3R5),scavenger receptor type F family member 1(SCARF1),and serine protease inhibitor clade E member 1(SERPINE1).The results of internal and external validation showed that the AUC of the risk score in predicting 1-,3-,and 5-year survival of patients in the training set was 0.73,0.64,and 0.68,respectively,and the AUC in predicting 1-,3-,and 5-year survival of patients in the validation set was 0.578,0.602,and 0.581,respectively,with OS being shorter in the high-risk group than in the low-risk group(all P<0.01).Results of univariate and multivariate COX regression analyses showed that the risk score was an independent factor for OS in patients in both the training and validation sets(training set HR=2.99,95%CI:2.11-4.23,P<0.01;validation set HR=2.47,95%CI:1.43-4.28,P<0.01).We constructed a nomogram containing all inde‐pendent prognostic factors(age,gender,tumor stage,risk score),with an overall consistency index of 0.710,indicating a high level of predictive accuracy for the model.The calibration and standard curves overlapped well.Conclusions Nine key genes of inflammatory response related to prognosis of LUAD are screened out,namely ADM,DCBLD2,IL-7R,MXD1,NMUR1,PCDH7,PIK3R5,SCARF1 and SERPINE1.The prognostic risk model based on key genes of inflammatory response is helpful to judge the prognosis of patients with LUAD.
作者
胥婉婷
加依娜·拉兹别克
刘新亚
文保锋
曹明芹
XU Wanting;Jiayina Lazibiek;LIU Xinya;WEN Baofeng;CAO Mingqin(Department of Public Health,Xinjiang Medical University,Urumqi 830017,China;不详)
出处
《山东医药》
CAS
2023年第35期19-23,共5页
Shandong Medical Journal
基金
新疆维吾尔自治区自然科学基金面上项目(2022D01C288)。
关键词
炎症反应相关基因
预后预测
机器学习
生物信息学
肺腺癌
inflammatory response-related genes
prognostic prediction
machine learning
bioinformatics
lung adenocarcinoma