期刊文献+

融合CatBoost和SHAP的乳腺癌预测及特征分析 被引量:4

Breast Cancer Prediction and Feature Analysis Model Based on CatBoost and SHAP
下载PDF
导出
摘要 针对当前乳腺癌预测模型存在性能不足和可解释性差的问题,提出一种融合CatBoost和SHAP的乳腺癌预测及特征分析模型。首先,对原始乳腺癌数据集进行异常值处理和数据归一化处理等工作,以提高数据的质量。然后,基于CatBoost建立乳腺癌预测的模型,并进行泛化能力分析。最后,将预测模型结合SHAP进行可解释分析,以探索影响乳腺癌的关键因素。使用威斯康星大学的Breast Cancer Wisconsin (Diagnostic)数据集验证该模型,结果表明:Accuracy值为99.30%,Precision值为99.50%,Recall值为98.91%,F1值为99.19%,均优于现有文献。其中Accuracy指标提升1.12~6.90个百分点,Precision指标提升2.00~7.50个百分点,Recall指标提升2.41~6.91个百分点,F1值提升2.19~7.19个百分点,以此验证本文模型的优越性。此外,SHAP模型得出影响乳腺癌的核心因素有concave points_worst(乳腺组织细胞核凹点极值)、perimeter_worst(乳腺组织细胞核周长极值)、area_worst(乳腺组织细胞核面积极值)等,这为医生诊断提供原理性支撑。 To address the problems of insufficient performance and poor interpretability of current breast cancer prediction mod-els,this paper proposes a breast cancer prediction and feature analysis model incorporating CatBoost and SHAP.First,the origi-nal breast cancer dataset is processed with outliers and data normalization to improve the quality of the data.Then,a model for breast cancer prediction based on CatBoost is built and generalization ability analysis is performed.Finally,the prediction model is combined with SHAP for interpretable analysis to explore the key factors affecting breast cancer.The model is validated using the Breast Cancer Wisconsin(Diagnostic)dataset from the University of Wisconsin,and the results show that the Accuracy value of 99.30%,Precision value of 99.50%,Recall value of 98.91%,and F1 value of 99.19%are better than the existing litera-ture.The superiority of this model is verified by the fact that the Accuracy index improved by 1.12-6.90 percentage points,the Precision index improved by 2.00-7.50 percentage points,the Recall index improved by 2.41-6.91 percentage points,and the F1 value improved by 2.19-7.19 percentage points.In addition,the SHAP model yields the core factors affecting breast cancer,
作者 贾潇瑶 JIA Xiao-yao(School of Mathematics and Data Science,Changji University,Changji 831100,China;College of Statistics and Data Science,Xinjiang University of Finance and Economics,Urumqi 830012,China)
出处 《计算机与现代化》 2023年第10期32-38,共7页 Computer and Modernization
关键词 CatBoost算法 可解释 乳腺癌 疾病预测 特征分析 机器学习 CatBoost algorithm interpretable breast cancer disease prediction feature analysis machine learning
  • 相关文献

参考文献9

二级参考文献62

共引文献67

同被引文献39

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部