期刊文献+

基于双模态乳腺影像特征的HER2表达状态多分类机器学习预测模型构建及可解释性分析研究

Multi⁃Class Machine Learning Predictive Model for HER2 Expression Status based on Dual⁃Modal Breast Imaging Features
原文传递
导出
摘要 目的基于乳腺X线及超声双模态影像特征构建多分类机器学习模型,术前预测乳腺癌患者的人类表皮生长因子受体2(HER2)表达状态。方法搜集符合纳排标准的632例女性乳腺癌患者,其中HER2不表达141例、低表达311例、高表达180例。提取研究对象全视野数字化乳腺X线摄影(FFDM)、数字乳腺断层摄影(DBT)以及乳腺超声(US)图像上乳腺癌病灶的影像学征象。在完成数据预处理后,基于五种机器学习算法构建乳腺癌HER2表达状态的三分类预测模型。采用五折交叉验证训练机器学习模型,计算三分类模型的总体准确率、精确率、查全率、F1⁃score、受试者工作特征曲线(ROC)曲线下面积(AUC)值。通过Bootstrap法比较模型间AUC值,筛选出分类性能最优的三分类模型。采用SHAP方法评估每个特征对模型预测HER2表达状态的重要性。结果在测试集上,随机森林(RF)模型对HER2表达状态的三分类性能最优,宏平均AUC为0.723,微平均AUC为0.783,总体准确率为57.4%,宏平均召回率为53.0%。SHAP分析结果表明,影响RF模型输出最重要的五个全局特征依次为X线观察到的钙化、舒张压、细线样或线样分支状钙化、US测量的病灶最大径、CA153。结论基于乳腺X线及超声双模态影像特征的机器学习多分类模型可以实现乳腺癌患者HER2表达状态的术前预测,结合SHAP方法可以提高机器学习模型的可解释性。 Objective The aim of this study is to construct multi⁃class machine learning models for preoperatively predicting the HER2 expression status of breast cancer patients using mammography and ultrasound dual⁃modal imaging features.Methods A total of 632 female breast cancer patients meeting the inclusion criteria were collected,including 141 cases with non⁃expression of HER2,311 cases with low expression,and 180 cases with high expression.Extract imaging features of breast cancer lesions from FFDM,DBT,and US images of the study subjects.After completing data preprocess,a three⁃class prediction model for the HER2 expression status in breast cancer was constructed based on five machine learning algorithms.The machine learning models were trained using five⁃fold cross⁃validation,and the overall accuracy,precision,recall,F1⁃score,and the AUC were calculated for the three⁃class models.The Bootstrap method was used to compare the AUCs between models in order to select the optimal three⁃class classification model with the best performance.The SHAP method was employed to assess the importance of each feature in predicting the HER2 expression status.Results In the test set,the RF model demonstrated the best performance in the three⁃class classification of HER2 expression status.The macro⁃average AUC was 0.723,the micro⁃average AUC was 0.783,and the overall accuracy was 57.4%.The macro⁃average recall rate was 53.0%.SHAP analysis revealed that the five most important global features influencing theoutput of the RF model were,in descending order,calcifications observed on X⁃ray,diastolic blood pressure,linear or branching calcifications,maximum lesion diameter measured by US,and CA153.Conclusion The machine learning multi⁃class prediction model based onmammographic and ultrasound dual⁃modal imaging features has the ability to predict the HER2 status of breast cancer patients before operation.Combining with SHAP values can enhance the interpretability of the machine learning model.
作者 舒予静 何子龙 曾伟雄 刘家玲 郭丫丫 陈卫国 SHU Yujing;HE Zilong;ZENG Weixiong(Department of Radiology,Nanfang Hospital,Southern Medical University,Guangzhou,Guangdong Province 510515,P.R.China)
出处 《临床放射学杂志》 北大核心 2024年第8期1310-1316,共7页 Journal of Clinical Radiology
基金 国家自然科学基金项目(编号:82171929)。
关键词 乳腺癌 HER2 表达状态 乳腺X 线摄影 超声 机器学习 Breast Cancer HER2 Status Mammography Ultrasound Machine Learning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部