摘要
目的 为解决损伤时间推断模型法医实践性不强、可解释性缺乏的问题,应用SHAP算法构建特征可解释机器学习模型,为损伤时间推断提供新策略。方法 基于前期发现与骨骼肌损伤时间密切相关的35个基因相对表达量,利用多层感知器(Multilayer Perceptron,MLP)、随机森林(Random Forest,RF)、LightGBM(LGBM)和支持向量机(Support Vector Machine,SVM)4种算法构建损伤时间推断模型。应用SHAP(SHapley Additive exPlanation)算法对模型进行基因特征重要性排序,剔除冗余特征,比较并获得损伤时间推断最优模型。基于SHAP的局部解释对最优模型提取到的基因特征进行了个性化评估和分析。结果 经过SHAP特征筛选后,MLP算法表现最佳。仅用15个基因特征,就能准确预测损伤时间段为4 h~12 h、16 h~24 h、28 h~36 h、40 h~48 h,受试者工作特征曲线下面积(Area Under the Curve,AUC)为0.99。SHAP结果显示与损伤时间推断最相关的基因是Fam210a。局部分析进一步揭示了Fam210a基因的高水平表达有助于增加4 h~12 h的预测概率;Rae1基因的高水平表达有助于增加16 h~24 h的预测概率;Tbx18基因的低水平表达有助于增加28 h~36 h的预测概率;Tbx18基因的高水平表达有助于增加40 h~48 h的预测概率。结论 MLP结合SHAP构建的损伤时间推断模型能准确预测损伤时间。此外,使用SHAP可解释器能够更好的理解模型中特征基因对模型预测的贡献度,为进一步深入研究损伤时间奠定基础。
Objective To address the challenges of poor performance and lack of interpretability in existing models,the SHAP algorithm is used to develop an interpretable machine learning model that offers a novel approach to wound age estimation,Methods Based on the previous discovery of the expression of 35 wound age healing-related genes in contused skeletal muscle,the woun age estimaton model was constructed using four algorithms,namly,Multilayer Perceptron(MLP),Random Forest(RF),LightGBM(LGBM),and Support Vector Machine(SVM).The SHAP(Shapley Additive Explanation)algorithm was used to rank the importance of genetic features,eliminate redundant attributes,and optimize the model for accurate wound age estimation.the genetic features of the optimal model were analyzed using SHAP's local interpretation capabilities.Results The best results were obtained using model of MLP(area under the curve(AUC) = 0.99) The wound ages were classified into four categories:4 ~ 12 h,16 ~ 24 h,28 ~ 36 h,and 40 ~ 48 h,using only 15 gene features.According to SHAP analysis,Fam210a was identified as the most relevant gene.Local analysis revealed that high expression of Fam210a contributed to an increase in the predicted probability of 4 h ~ 12 h,while high expression of Rae1 contributed to an increase in the predicted probability of 16 h ~ 24 h.Additionally,low expression of Tbx18 contributed to an increase in the predicted probability of 28 h ~ 36 h,whereas high expression of Tbx18 contributed to an increase in the predicted probability of 40 h ~ 48 h.Conclusions The combined MLP and SHAP model can be used to predict wound age.Using the SHAP interpreter can better understand the degree of contribution of feature genes to the model prediction,and lay the foundation for further in-depth study of wound age estimation.
作者
吕慧敏
刘明锋
靳茜茜
张艺博
安国帅
杜秋香
王英元
孙俊红
Lv Huimin;Liu Mingfeng;Jin Qianqian;Zhang Yibo;An Guoshuai;Du Qiuxiang;Wang Yingyuan;Sun Junhong(School of Forensic Medicine,Shanxi Medical University,Jinzhong 030600)
出处
《中国法医学杂志》
CSCD
2024年第3期320-326,共7页
Chinese Journal of Forensic Medicine
基金
国家自然科学基金资助项目(81971795)
山西省青年科技研究基金面上青年基金资助项目(201901D211334)
山西省科技创新人才团队专项(202204051001025)。
关键词
法医病理
损伤时间推断
机器学习模型
SHAP
特征解释
Forensic pathology
Wound age estimation
Machine learning algorithms
SHAP
Feature explanation