摘要
目的基于深度森林(gcForest)、宽度学习(BLS)及梯度提升树(GBDT)等机器学习模型,进行低增生性骨髓增生异常综合征(hypo-MDS)和再生障碍性贫血(AA)的鉴别诊断。方法回顾性收集2008年1月1日—2022年12月31日在华北理工大学附属医院血液科首诊确诊的hypo-MDS患者与AA患者的基本信息、病史和临床检查资料。通过因素分析、结合文献查阅结果和临床专家意见,确定最终进入模型的输入变量,将研究对象随机划分为70%的训练样本和30%的验证样本,分别建立hypo-MDS和AA的gcForest、BLS及GBDT鉴别诊断模型。通过灵敏度、特异度、ROC曲线、AUC、Brier分数、校准曲线及DCA曲线比较各模型的性能,选出最优的鉴别分类模型。结果通过因素分析结合文献查阅和专家咨询,确定了年龄、红细胞计数、血红蛋白含量、中性粒细胞、早幼红细胞、中幼红细胞、晚幼红细胞、成熟淋巴细胞及成熟浆细胞等9个指标为模型的输入变量。对于验证集,gcForest、BLS和GBDT鉴别诊断模型的准确率分别为76.74%、79.07%和83.92;灵敏度分别为62.16%、72.92%和87.69%;特异度分别为87.76%、86.84%和80.77%;Brier分数分别为0.147、0.143和0.119;AUC分别为0.767(95%CI:0.731~0.805)、0.785(95%CI:0.739~0.834)和0.834(95%CI:0.808~0.861),GBDT模型的AUC高于gcForest模型,差异有统计学意义(P<0.05)。GBDT模型的校准曲线相较于其它两个模型更靠近对角线,且其临床决策曲线下面积最大。结论三种模型中GBDT模型用于hypo-MDS和AA的鉴别诊断效果最佳。
Objective To differentiate diagnose hypocellular myelodysplastic syndrome(hypo-MDS)and aplastic anemia(AA)based on machine learning models including Muti-Grained Cascade Forest(gcForest),Broad Learning System(BLS),and Gradient Boosting Decision Tree(GBDT).Methods The basic information,medical history and clinical examination data of hypo-MDS patients and AA patients who were first diagnosed in hematology department of North China University of Science and Technology Affiliated Hospital from January 1,2008 to December 31,2022 were retrospectively collected.The final input variables were determined based on result of factor analysis,literature review results and clinical experts'opinions.The research subjects were randomly divided into 70%of training samples and 30%of verification samples.The differential diagnosis models of gcForest,BLS,GBDT for hypo-MDS and AA were established,respectively.The performance of each model is compared by sensitivity,specificity,ROC curve,AUC,Brier score,calibration curve and DCA curve,and the optimal discriminant classification model is selected.Results Nine indicators including age,red blood cell count,hemoglobin content,neutrophils,promyelocytes,medium-sized,latesized erythrocytes,mature lymphocytes and mature plasma cells were identified as the input variables of the model based on result of factor analysis,literature review results and clinical experts'opinions.For the validation set,the accuracy rates of gcForest,BLS,and GBDT differential diagnosis models were 76.74%,79.07%and 83.92%.The sensitivities were 62.16%,72.92%and 87.69%.The specificities were 87.76%,86.84%and 80.77%.Brier scores were 0.147,0.143 and 0.119.AUC values were 0.767(95%CI:0.731~0.805),0.785(95%CI:0.739~0.834)and 0.834(95%CI:0.808~0.861).As for AUC,the value of GBDT model was higher than that of gcForest model(P<0.05).The calibration curve of GBDT model was closer to the diagonal than the other two models,and the area under clinical decision curve was the largest.Conclusion Among those three models,GBDT model was the best one for the differentiation and diagnosis of hypo-MDS and AA.
作者
宋洁
杨美荣
贾文婷
SONG Jie;YANG Meirong;JIA Wenting(Department of Radiotherapy and chemotherapy of tumor,North China University of Science and Technology Affiliated Hospital,Tangshan 063000,China)
出处
《中国煤炭工业医学杂志》
2024年第3期313-319,共7页
Chinese Journal of Coal Industry Medicine
基金
河北省自然科学基金(编号:20221520)。
关键词
梯度提升树
低增生性骨髓增生异常综合征
再生障碍性贫血
鉴别诊断
Gradient boosting decision tree
Hypocellular myelodysplastic syndrome
Aplastic anemia
Differential diagnosis