期刊文献+

基于机器学习的早期胃癌淋巴结转移预测模型构建与验证

Construction and validation of a prediction model for lymph node metastasis in early gastric cancer based on machine learning
下载PDF
导出
摘要 目的通过机器学习构建最佳的早期胃癌淋巴结转移(lymphatic metastasis,LNM)预测模型,并对其性能进行验证。方法收集2015年1月至2022年12月在本院行根治性手术治疗的433例早期胃癌患者的临床资料。以7∶3的比例划分为训练集和验证集。运用LASSO回归筛选变量,并基于多因素Logistic回归确定早期胃癌LNM的独立危险因素。应用类别型特征梯度提升机(categorical boosting,Catboost)、轻量的梯度提升机(light gradient boosting machine,LightGBM)、极端梯度提升机(eXtreme gradient boosting machine,XGboost)、随机森林(random forest,RF)、梯度提升树(gradient boosting machine,GBM)、神经网络(neural networks,NNET)、支持向量机(support vector machine,SVM)、K最临近(K-nearest,KNN)、朴素贝叶斯(naiveBayes,NB)及Logistic回归共10种ML算法构建预测模型。利用准确率、精确率、召回率、F1评分值、灵敏度、特异度、阳性预测率、阴性预测率、Kappa值、受试者工作特征曲线下面积值(AUC)、校准曲线、决策曲线以及精确率-召回率曲线评估和比较模型的预测能力。基于Shaply加性解释(SHapley additive exPlanations,SHAP)方法解释最佳模型中各变量对发生结局影响的贡献度。结果肿瘤浸润深度、淋巴血管浸润和吸烟史是早期胃癌LNM的独立危险因素。Catboost模型具有最佳预测性能,其在训练集中的5个性能指标均优于其他模型[AUC:0.904(95%CI 0.868-0.940),F1评分:0.633,Brier评分:0.100,阴性预测率:0.975,Kappa:0.520]。通过计算Catboost的SHAP值发现,肿瘤浸润深度和淋巴血管浸润是预测LNM的2个关键特征变量。结论肿瘤浸润深度为黏膜下层、淋巴血管浸润和吸烟史是早期胃癌LNM的独立危险因素。机器学习可用于预测LNM风险,Catboost模型具有最佳预测性能,并可为临床诊断和治疗决策提供指导。 Objective To construct an optimal prediction model for lymph node metastasis(LNM)in early gastric cancer(EGC)using machine learning techniques and assess its predictive performance.Methods Clinical data of 433 EGC patients undergoing radical surgery in our hospital from January 2015 to December 2022 were collected.They were divided into a training set and a validation set in a 7∶3 ratio.LASSO regression was used to screen variables and multivariate logistic regression analysis was employed to identify independent risk factors for LNM in the EGC patients.Ten machine learning models were constructed using categorical boosting(Catboost),light gradient boosting machine(LightGBM),extreme gradient boosting machine(XGboost),random forest(RF),gradient boosting machine(GBM),neural networks(NNET),support vector machine(SVM),K nearest(KNN),Naive Bayes(NB)and Logistic regression.The predictive power of the above models was evaluated and compared in terms of accuracy,precision,recall,F1 score value,sensitivity,specificity,positive predictive rate,negative predictive rate,Kappa value,area value under the receiver operating characteristic curve(AUC),calibration curve,decision curve,and precision-recall curve.Finally,SHAP(SHapley Additive exPlanations)was applied to explain the contribution of each variable in the best model for the prediction outcomes.Results Depth of tumor invasion,lymphovascular invasion and smoking history were independent risk factors for LNM in the EGC patients.Catboost model obtained the best predictive performance,and had 5 performance indicators outperforming the other models in the training set,that is,an AUC value of 0.904(95%CI 0.868~0.940),a F1-score of 0.633,a Brier score of 0.100,a negative predictive rate of 0.975,and a Kappa value of 0.520.Finally,calculating the SHAP values of Catboost revealed that the depth of tumor invasion and lymphovascular invasion were two key characteristic variables for predicting LNM.Conclusion The depth of tumor invasion of submucosal and lymphovascular invasion and smoking history are independent risk factors for LNM in early gastric cancer.ML can be used to predict LNM risk,the Catboost model has the best predictive performance and can provide guidance for clinical diagnosis and treatment decisions.
作者 孟祥勇 秦嘉怡 陈文生 MENG Xiangyong;QIN Jiayi;CHEN Wensheng(Department of Gastroenterology,First Affiliated Hospital,Army Medical University(Third Military Medical University),Chongqing,400038,China)
出处 《陆军军医大学学报》 CAS CSCD 北大核心 2024年第21期2432-2442,共11页 Journal of Army Medical University
关键词 早期胃癌 淋巴结转移 机器学习 预测模型 early gastric cancer lymph node metastasis machine learning prediction model
  • 相关文献

参考文献3

二级参考文献15

共引文献365

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部