摘要
Objective Clinical medical record data associated with hepatitis B-related acute-on-chronic liver failure(HBV-ACLF)generally have small sample sizes and a class imbalance.However,most machine learning models are designed based on balanced data and lack interpretability.This study aimed to propose a traditional Chinese medicine(TCM)diagnostic model for HBV-ACLF based on the TCM syndrome differentiation and treatment theory,which is clinically interpretable and highly accurate.Methods We collected medical records from 261 patients diagnosed with HBV-ACLF,including three syndromes:Yang jaundice(214 cases),Yang-Yin jaundice(41 cases),and Yin jaundice(6 cases).To avoid overfitting of the machine learning model,we excluded the cases of Yin jaundice.After data standardization and cleaning,we obtained 255 relevant medical records of Yang jaundice and Yang-Yin jaundice.To address the class imbalance issue,we employed the oversampling method and five machine learning methods,including logistic regression(LR),support vector machine(SVM),decision tree(DT),random forest(RF),and extreme gradient boosting(XGBoost)to construct the syndrome diagnosis models.This study used precision,F1 score,the area under the receiver operating characteristic(ROC)curve(AUC),and accuracy as model evaluation metrics.The model with the best classification performance was selected to extract the diagnostic rule,and its clinical significance was thoroughly analyzed.Furthermore,we proposed a novel multiple-round stable rule extraction(MRSRE)method to obtain a stable rule set of features that can exhibit the model’s clinical interpretability.Results The precision of the five machine learning models built using oversampled balanced data exceeded 0.90.Among these models,the accuracy of RF classification of syndrome types was 0.92,and the mean F1 scores of the two categories of Yang jaundice and Yang-Yin jaundice were 0.93 and 0.94,respectively.Additionally,the AUC was 0.98.The extraction rules of the RF syndrome differentiation model based on the MRSRE method revealed that the common features of Yang jaundice and Yang-Yin jaundice were wiry pulse,yellowing of the urine,skin,and eyes,normal tongue body,healthy sublingual vessel,nausea,oil loathing,and poor appetite.The main features of Yang jaundice were a red tongue body and thickened sublingual vessels,whereas those of Yang-Yin jaundice were a dark tongue body,pale white tongue body,white tongue coating,lack of strength,slippery pulse,light red tongue body,slimy tongue coating,and abdominal distension.This is aligned with the classifications made by TCM experts based on TCM syndrome differentiation and treatment theory.Conclusion Our model can be utilized for differentiating HBV-ACLF syndromes,which has the potential to be applied to generate other clinically interpretable models with high accuracy on clinical data characterized by small sample sizes and a class imbalance.
目的乙肝相关慢加急性肝衰竭(HBV-ACLF)临床病历数据普遍存在样本量小、类别不平衡等问题,而大部分机器学习模型是基于平衡数据设计的,缺乏可解释性。本研究旨在基于中医辨证论治理论,提出一种临床可解释、准确率高的HBV-ACLF中医诊断模型。方法本研究收集了261例HBV-ACLF患者的病例,包括阳黄证(214例)、阳阴黄证(41例)和阴黄证(6例)三种证型。为了避免机器学习模型过拟合,排除了阴黄病例。经过数据标准化和清洗,获得阳黄证和阳阴黄证相关的255份病历。针对类别不平衡问题,采用过采样方法和五种机器学习方法,包括逻辑回归(LR)、支持向量机(SVM)、决策树(DT)、随机森林(RF)和极端梯度提升(XGBoost),构建了证型诊断模型。本研究以精度、F1得分、受试者工作特征曲线下面积(AUC)和准确率作为模型评价指标。选择分类结果最好的模型提取诊断规则,并深入分析其临床意义。此外,我们提出了一种新颖的多轮稳定规则提取(MRSRE)方法,以获得可以展示模型临床可解释性的稳定特征规则集。结果利用过采样平衡数据构建的五种机器学习模型精度都超过了0.90,其中RF证型分类准确率为0.92,阳黄及阳阴黄两类别的F1均值分别为0.93和0.94,AUC值为0.98。基于MRSRE方法的RF辨证模型提取规则显示,阳黄及阳阴黄的共同特征是脉弦,身目尿黄,舌体正常,舌下脉络正常,恶心和厌油纳差。阳黄的主要特点是舌质红、舌下脉络增粗,阳阴黄的主要特点是舌质暗、淡白、苔白、无力、脉滑、舌质淡红、舌苔腻和腹胀,该结果与中医专家依据中医辨证论治理论相一致。结论本研究构建的模型可用于区分HBV-ACLF证型,还可用于生成其他临床可解释的模型,这些模型对样本量小且类别不平衡的临床数据具有较高的准确性。
基金
Key research project of Hunan Provincial Administration of Traditional Chinese Medicine(A2023048)
Key Research Foundation of Education Bureau of Hunan Province,China(23A0273).