期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
A Study on the Explainability of Thyroid Cancer Prediction:SHAP Values and Association-Rule Based Feature Integration Framework
1
作者 Sujithra Sankar S.Sathyalakshmi 《Computers, Materials & Continua》 SCIE EI 2024年第5期3111-3138,共28页
In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroi... In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroid cancer enhance early detection,improve resource allocation,and reduce overtreatment.However,the widespread adoption of these models in clinical practice demands predictive performance along with interpretability and transparency.This paper proposes a novel association-rule based feature-integratedmachine learning model which shows better classification and prediction accuracy than present state-of-the-artmodels.Our study also focuses on the application of SHapley Additive exPlanations(SHAP)values as a powerful tool for explaining thyroid cancer prediction models.In the proposed method,the association-rule based feature integration framework identifies frequently occurring attribute combinations in the dataset.The original dataset is used in trainingmachine learning models,and further used in generating SHAP values fromthesemodels.In the next phase,the dataset is integrated with the dominant feature sets identified through association-rule based analysis.This new integrated dataset is used in re-training the machine learning models.The new SHAP values generated from these models help in validating the contributions of feature sets in predicting malignancy.The conventional machine learning models lack interpretability,which can hinder their integration into clinical decision-making systems.In this study,the SHAP values are introduced along with association-rule based feature integration as a comprehensive framework for understanding the contributions of feature sets inmodelling the predictions.The study discusses the importance of reliable predictive models for early diagnosis of thyroid cancer,and a validation framework of explainability.The proposed model shows an accuracy of 93.48%.Performance metrics such as precision,recall,F1-score,and the area under the receiver operating characteristic(AUROC)are also higher than the baseline models.The results of the proposed model help us identify the dominant feature sets that impact thyroid cancer classification and prediction.The features{calcification}and{shape}consistently emerged as the top-ranked features associated with thyroid malignancy,in both association-rule based interestingnessmetric values and SHAPmethods.The paper highlights the potential of the rule-based integrated models with SHAP in bridging the gap between the machine learning predictions and the interpretability of this prediction which is required for real-world medical applications. 展开更多
关键词 Explainable AI machine learning clinical decision support systems thyroid cancer association-rule based framework shap values classification and prediction
下载PDF
Real-Time Fraud Detection Using Machine Learning
2
作者 Benjamin Borketey 《Journal of Data Analysis and Information Processing》 2024年第2期189-209,共21页
Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit ca... Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as “V12” and “V14”. SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers. 展开更多
关键词 Credit Card Fraud Detection Machine Learning shap values Random Forest
下载PDF
基于XGBoost算法的山东烟叶质量预测模型初探 被引量:5
3
作者 别瑞 周婷云 +4 位作者 周显升 姜滨 周永 邱军 曹建敏 《中国烟草科学》 CSCD 北大核心 2022年第5期80-86,93,共8页
为挖掘烟叶化学成分与感官质量之间的关系,探究机器学习算法在烟叶质量评价领域的应用效果,以山东烟叶为试验材料,开展了常规成分、生物碱、有机酸、多酚和单双糖等20项主要化学成分检测和感官质量评价,并根据感官质量优劣将其划分为好... 为挖掘烟叶化学成分与感官质量之间的关系,探究机器学习算法在烟叶质量评价领域的应用效果,以山东烟叶为试验材料,开展了常规成分、生物碱、有机酸、多酚和单双糖等20项主要化学成分检测和感官质量评价,并根据感官质量优劣将其划分为好、中、差3个质量档次。利用遗传算法对XGBoost进行超参数寻优,建立了基于化学成分的山东烟叶质量档次预测模型,同时引入SHAP value模型解释框架进行全局解释与特征依赖分析。所建预测模型对山东烟叶质量档次判别准确率为85%,尤其对第3质量档次识别效果最佳。SHAP value全局解释表明,影响山东烤烟质量的7个特征指标贡献度排名为:酸酚比>蔗糖>氯>烟碱>降烟碱>柠檬酸>糖碱比,其中糖碱比、蔗糖、酸酚比分别为好、中、差质量档次判别贡献最大的化学指标。基于XGBoost算法的山东烟叶质量预测模型在烟叶质量档次判别应用中有效、可靠、可解释性强,对于烟叶质量评价和烟叶生产具有一定指导意义。 展开更多
关键词 山东烟叶 XGBoost 机器学习 shap value 质量预测
下载PDF
机器学习方法能识别中国系统性金融风险的概率吗?
4
作者 王达 周映雪 《金融市场研究》 2023年第7期48-58,共11页
本文采用梯度提升树这一机器学习模型,基于美国等17个国家的25个特征变量的宏观经济数据集,构造了风险识别模型对中国的系统性风险概率进行全面解析,并通过SHAP Value解释模型,在非线性非参数模型下探索中国的风险影响因素。实证结果表... 本文采用梯度提升树这一机器学习模型,基于美国等17个国家的25个特征变量的宏观经济数据集,构造了风险识别模型对中国的系统性风险概率进行全面解析,并通过SHAP Value解释模型,在非线性非参数模型下探索中国的风险影响因素。实证结果表明,梯度提升树模型对系统性风险的捕捉能力显著优于传统的逻辑回归模型,其能够较好地刻画中国的风险概率走势;经过SHAP Value分解可发现,信贷因素、货币因素、金融市场化因素及国内总储蓄等是主要的风险拉动因子,且均存在明显的阈值效应。 展开更多
关键词 系统性风险 机器学习 梯度提升树 shap Value
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部