期刊文献+

An interpretability model for syndrome differentiation of HBV-ACLF in traditional Chinese medicine using small-sample imbalanced data

基于小样本不平衡数据构建乙肝相关慢加急性肝衰竭中医辨证分型的可解释性模型
下载PDF
导出
摘要 Objective Clinical medical record data associated with hepatitis B-related acute-on-chronic liver failure(HBV-ACLF)generally have small sample sizes and a class imbalance.However,most machine learning models are designed based on balanced data and lack interpretability.This study aimed to propose a traditional Chinese medicine(TCM)diagnostic model for HBV-ACLF based on the TCM syndrome differentiation and treatment theory,which is clinically interpretable and highly accurate.Methods We collected medical records from 261 patients diagnosed with HBV-ACLF,including three syndromes:Yang jaundice(214 cases),Yang-Yin jaundice(41 cases),and Yin jaundice(6 cases).To avoid overfitting of the machine learning model,we excluded the cases of Yin jaundice.After data standardization and cleaning,we obtained 255 relevant medical records of Yang jaundice and Yang-Yin jaundice.To address the class imbalance issue,we employed the oversampling method and five machine learning methods,including logistic regression(LR),support vector machine(SVM),decision tree(DT),random forest(RF),and extreme gradient boosting(XGBoost)to construct the syndrome diagnosis models.This study used precision,F1 score,the area under the receiver operating characteristic(ROC)curve(AUC),and accuracy as model evaluation metrics.The model with the best classification performance was selected to extract the diagnostic rule,and its clinical significance was thoroughly analyzed.Furthermore,we proposed a novel multiple-round stable rule extraction(MRSRE)method to obtain a stable rule set of features that can exhibit the model’s clinical interpretability.Results The precision of the five machine learning models built using oversampled balanced data exceeded 0.90.Among these models,the accuracy of RF classification of syndrome types was 0.92,and the mean F1 scores of the two categories of Yang jaundice and Yang-Yin jaundice were 0.93 and 0.94,respectively.Additionally,the AUC was 0.98.The extraction rules of the RF syndrome differentiation model based on the MRSRE method revealed that the common features of Yang jaundice and Yang-Yin jaundice were wiry pulse,yellowing of the urine,skin,and eyes,normal tongue body,healthy sublingual vessel,nausea,oil loathing,and poor appetite.The main features of Yang jaundice were a red tongue body and thickened sublingual vessels,whereas those of Yang-Yin jaundice were a dark tongue body,pale white tongue body,white tongue coating,lack of strength,slippery pulse,light red tongue body,slimy tongue coating,and abdominal distension.This is aligned with the classifications made by TCM experts based on TCM syndrome differentiation and treatment theory.Conclusion Our model can be utilized for differentiating HBV-ACLF syndromes,which has the potential to be applied to generate other clinically interpretable models with high accuracy on clinical data characterized by small sample sizes and a class imbalance. 目的乙肝相关慢加急性肝衰竭(HBV-ACLF)临床病历数据普遍存在样本量小、类别不平衡等问题,而大部分机器学习模型是基于平衡数据设计的,缺乏可解释性。本研究旨在基于中医辨证论治理论,提出一种临床可解释、准确率高的HBV-ACLF中医诊断模型。方法本研究收集了261例HBV-ACLF患者的病例,包括阳黄证(214例)、阳阴黄证(41例)和阴黄证(6例)三种证型。为了避免机器学习模型过拟合,排除了阴黄病例。经过数据标准化和清洗,获得阳黄证和阳阴黄证相关的255份病历。针对类别不平衡问题,采用过采样方法和五种机器学习方法,包括逻辑回归(LR)、支持向量机(SVM)、决策树(DT)、随机森林(RF)和极端梯度提升(XGBoost),构建了证型诊断模型。本研究以精度、F1得分、受试者工作特征曲线下面积(AUC)和准确率作为模型评价指标。选择分类结果最好的模型提取诊断规则,并深入分析其临床意义。此外,我们提出了一种新颖的多轮稳定规则提取(MRSRE)方法,以获得可以展示模型临床可解释性的稳定特征规则集。结果利用过采样平衡数据构建的五种机器学习模型精度都超过了0.90,其中RF证型分类准确率为0.92,阳黄及阳阴黄两类别的F1均值分别为0.93和0.94,AUC值为0.98。基于MRSRE方法的RF辨证模型提取规则显示,阳黄及阳阴黄的共同特征是脉弦,身目尿黄,舌体正常,舌下脉络正常,恶心和厌油纳差。阳黄的主要特点是舌质红、舌下脉络增粗,阳阴黄的主要特点是舌质暗、淡白、苔白、无力、脉滑、舌质淡红、舌苔腻和腹胀,该结果与中医专家依据中医辨证论治理论相一致。结论本研究构建的模型可用于区分HBV-ACLF证型,还可用于生成其他临床可解释的模型,这些模型对样本量小且类别不平衡的临床数据具有较高的准确性。
作者 ZHOU Zhan PENG Qinghua XIAO Xiaoxia ZOU Beiji LIU Bin GUO Shuixia 周展;彭清华;肖晓霞;邹北骥;刘彬;郭水霞(湖南中医药大学信息科学与工程学院,湖南长沙410208;湖南中医药大学中医学院,湖南长沙410208;湖南师范大学数学与统计学院,湖南长沙410081)
出处 《Digital Chinese Medicine》 CAS CSCD 2024年第2期137-147,共11页 数字中医药(英文)
基金 Key research project of Hunan Provincial Administration of Traditional Chinese Medicine(A2023048) Key Research Foundation of Education Bureau of Hunan Province,China(23A0273).
关键词 Traditional Chinese medicine(TCM) Hepatitis B-related acute-on-chronic liver failure(HBV-ACLF) Imbalanced data Random forest(RF) INTERPRETABILITY 中医 乙肝相关慢加急性肝衰竭 不平衡数据 随机森林 可解释性
  • 相关文献

参考文献14

二级参考文献199

共引文献213

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部