期刊文献+

基于机器学习乳腺癌预测及SHAP特征分析

Machine learning based breast cancer prediction and SHAP feature analysis
下载PDF
导出
摘要 乳腺癌作为全球新发病例最多的癌症,严重影响和损伤人们生命质量,乳腺癌的预测与了解其发病机制是目前仍需更多研究的问题。针对乳腺癌诊断的准确性需求,本文旨在通过应用机器学习算法提升乳腺癌预测模型的精确度,为医生的决策制定提供支持,有效实现“三早”预防,并为疾病病因的深入研究提供新的线索。以美国威斯康星州在Kaggle平台发布的乳腺癌公开数据集为研究对象,首先在数据预处理后,借助随机森林的递归特征消除法进行变量的重要性排序和特征选择。其次,利用网格搜索法优化超参数,运用LightGBM算法构建预测模型,并引入SHAP值增强模型的可解释性,进一步揭示乳腺癌相关的危险因素及其作用机制。最后,通过AUC值等评价指标对模型的预测性能进行评估。结果表明,模型的表现优于传统模型,预测准确率达到97%,且AUC值为0.97,有效提升了乳腺癌的正确识别能力。 Breast cancer is the most prevalent cancer in the world and has a serious impact on the quality of life.The aim of this paper is to improve the accuracy of breast cancer prediction models by applying machine learning algorithms to support doctors'decision making,to achieve"three early"prevention and to provide new clues for further research on the cause of the disease.Using a public dataset of breast cancer from the state of Wisconsin published on the Kaggle platform,firstly the data is pre-processed and then the recursive feature elimination method of random forests is used to rank the importance of variables and select features.Secondly,a grid search method is used to optimize the hyperparameters,and the LightGBM algorithm is applied to construct a prediction model,and SHAP values are introduced to enhance the interpretability of the model to further reveal the risk factors associated with breast cancer and their mechanisms of action.Finally,the predictive performance of the model is assessed by evaluation indicators such as AUC values.The results shows that the model outperforms the traditional model,with a prediction accuracy of 97%and an AUC value of 0.97,effectively improving the correct identification of breast cancer.
作者 刘明明 王广静 赵子涵 骆谋萸 谢静 LIU Mingming;WANG Guangjing;ZHAO Zihan;LUO Mouyu;XIE Jing(School of Public Base,Bengbu Medical University,Bengbu 233030,Anhui,China;School of Public Health,Bengbu Medical University,Bengbu 233030,Anhui,China)
出处 《智能计算机与应用》 2024年第10期194-200,共7页 Intelligent Computer and Applications
基金 安徽省高校人文社会科学重点项目(SK2020A0357) 蚌埠医学院自然科学重点项目(KYBY1704ZD)。
关键词 机器学习 乳腺癌预测 LightGBM算法 SHAP值 machine learning breast cancer prediction LightGBM algorithm SHAP values
  • 相关文献

参考文献8

二级参考文献65

  • 1申羽,庄天戈,程红岩,徐雯.朴素贝叶斯算法在原发性肝癌预后预测中的研究[J].航天医学与医学工程,2004,17(5):350-354. 被引量:6
  • 2孙奎东.小儿呼吸道反复感染发病因素国内研究现状[J].河北北方学院学报(医学版),2006,23(1):72-74. 被引量:6
  • 3李双飞,李佳圆,雷放鸣,李卉,孙荣国,周卫东,陶萍,李伟红.乳腺癌危险因素的病例对照研究[J].现代预防医学,2006,33(12):2233-2235. 被引量:14
  • 4汪向东 王希林 马弘主编.心理卫生评定量表手册:增订版[M].北京:中国心理卫生杂志社,1999.31435.
  • 5郝捷,陈万青.2012中国肿瘤年报[M].北京:军事医学科学出版社,2012:50.
  • 6Vetto JT, Luoh SW,Naik A. Breast cancer in premenopausal women[J] Curr Probl Surg,2009,46(12) ..944-1004.
  • 7Tao W, Wang C, Han R, et al. HER2 codon 655 poly- morphism and breast cancer risk: a meta-analysis[J]. Breast Cancer Res Treat,2009,114(2) : 371-376.
  • 8Meisner AL, Fekrazad MH, Royce ME. Breast disease: benign and malignant[J]. Med Clin North Am,2008,92(5): 1115-1141.
  • 9Bernstein L. Identifying population-based approaches to lower breast cancer risk[J]. Oncogene, 2008,27 (Suppl 2) : S3-S8.
  • 10Warri A,Saarinen NM,Makela S,et al. The role of early life geniste in exposures in modifying breast cancer risk[J]. Br J Cancer,2008,98(9) : 1485-1493.

共引文献174

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部