摘要
目的探讨机器学习模型在预测胸段食管鳞状细胞癌(鳞癌)患者术后生存风险分层中的应用价值。方法回顾性分析2014年1月—2015年9月在苏北人民医院胸外科行食管癌根治术的369例胸段食管鳞癌患者的临床资料,其中男279例(75.6%)、女90例(24.4%),年龄41~78岁。将患者按7∶3比例随机分为训练集(259例)和测试集(110例)。通过选择最佳特征子集进行变量筛选。在此基础上构建6种机器学习模型,并在独立测试集验证。通过曲线下面积(area under the curve,AUC)、准确率、对数损失函数值评价模型的预测性能,通过校准曲线反映模型的拟合情况。从中选出最佳模型作为最终模型,利用X-tile进行风险分层,采用Kaplan-Meier法与log-rank检验进行生存分析。结果胸段食管鳞癌患者术后5年生存率为67.5%。训练集和测试集之间各项临床病理特征差异均无统计学意义(P均>0.05)。最终纳入高血压、吸烟史、饮酒史、组织分化程度、pN分期、脉管侵犯、神经侵犯共7个变量进行建模,各模型在独立测试集中的AUC值分别为:决策树(AUC=0.796)、支持向量机(AUC=0.829)、随机森林(AUC=0.831)、逻辑回归(AUC=0.838)、梯度提升机(AUC=0.846)、XGBoost(AUC=0.853)。最终遴选出XGBoost模型作为最佳模型,并分别对训练集和测试集进行风险分层,其中将训练集和测试集患者分别分为低危组、中危组和高危组。在训练集和测试集中,三组患者手术预后情况差异均有统计学意义(P<0.001)。结论机器学习模型在预测胸段食管鳞癌术后预后方面具有较高价值,XGBoost模型对胸段食管鳞癌患者术后5年生存情况的预测性能优于常见机器学习方法,具有较高的实用性和可靠性。
Objective To explore the application value of machine learning models in predicting postoperative survival of patients with thoracic squamous esophageal cancer.Methods The clinical data of 369 patients with thoracic esophageal squamous carcinoma who underwent radical esophageal cancer surgery at the Department of Thoracic Surgery of Northern Jiangsu People's Hospital from January 2014 to September 2015 were retrospectively analyzed.There were 279(75.6%)males and 90(24.4%)females aged 41-78 years.The patients were randomly divided into a training set(259 patients)and a test set(110 patients)with a ratio of 7:3.Variable screening was performed by selecting the best subset of features.Six machine learning models were constructed on this basis and validated in an independent test set.The performance of the models'predictions was evaluated by area under the curve(AUC),accuracy and logarithmic loss,and the fit of the models was reflected by calibration curves.The best model was selected as the final model.Risk stratification was performed using X-tile,and survival analysis was performed using the Kaplan-Meier method with log-rank test.Results The 5-year postoperative survival rate of the patients was 67.5%.All clinicopathological characteristics of patients between the two groups in the training and test sets were not statistically different(P>0.05).A total of seven variables,including hypertension,history of smoking,history of alcohol consumption,degree of tissue differentiation,pN stage,vascular invasion and nerve invasion,were included for modelling.The AUC values for each model in the independent test set were:decision tree(AUC=0.796),support vector machine(AUC=0.829),random forest(AUC=0.831),logistic regression(AUC=0.838),gradient boosting machine(AUC=0.846),and XGBoost(AUC=0.853).The XGBoost model was finally selected as the best model,and risk stratification was performed on the training and test sets.Patients in the training and test sets were divided into a low risk group,an intermediate risk group and a high risk group,respectively.In both data sets,the differences in surgical prognosis among three groups were statistically significant(P<0.001).Conclusion Machine learning models have high value in predicting postoperative prognosis of thoracic squamous esophageal cancer.The XGBoost model outperforms common machine learning methods in predicting 5-year survival of patients with thoracic squamous esophageal cancer,and it has high utility and reliability.
作者
徐瑾业
周江晖
刘生伟
陈良亮
胡俊熙
王霄霖
束余声
XU Jinye;ZHOU Jianghui;LIU Shengwei;CHEN Liangliang;HU Junxi;WANG Xiaolin;SHU Yusheng(Medical College of Yangzhou University,Yangzhou,225000,Jiangsu,P.R.China;Department of Thoracic Surgery,Northern Jiangsu People's Hospital,Clinical Medicine College of Yangzhou University,Yangzhou,225000,Jiangsu,P.R.China;Department of Thoracic Surgery,The First Hospital Affiliated to Army Medical University,Chongqing,400038,P.R.China)
出处
《中国胸心血管外科临床杂志》
CSCD
北大核心
2022年第12期1574-1579,共6页
Chinese Journal of Clinical Thoracic and Cardiovascular Surgery
基金
江苏省卫生健康委员会老年健康科研课题项目(LKZ2022019)
扬州市科技局社会发展-临床前沿技术项目(YZ2021078)。
关键词
食管癌
机器学习
手术
预后
预测模型
生存风险分层
Esophageal neoplasms
machine learning
surgery
prognosis
prediction model
survival risk stratification