摘要
目的提出基于机器学习的院内消化道致命性再出血预测和指标筛选方法。方法从解放军总医院急救数据库中提取确诊为消化道出血样本728例次,其中确定发生院内消化道致命性再出血患者343例次。提取、筛选得到相关生理或化验指标共计64项。在十折交叉验证的基础上,分别使用Logistic回归、以决策树为弱分类器的自适应增强(AdaBoost)算法、以决策树为弱分类器的XGBoost算法进行分类预测并对比;利用XGBoost算法进行序列特征前向搜索,以训练时迭代出的指标重要性进行筛选,并得到预测院内消化道致命性再出血的关键指标。结果 Logistic回归和基于决策树的AdaBoost算法、XGBoost算法在各特征输入维度下均得到了较好的F1.5分数,其中XGBoost算法效果最好、评分最高,即能够尽可能找出更多的可能发生院内消化道致命性再出血的患者。通过XGBoost算法迭代结果得到了预测院内消化道致命性再出血的前30个重要性较高的指标,其中前12个关键指标迭代时F1.5分数达到峰值(0.893),分别为血红蛋白测定(Hb)、钙(CA)、红细胞计数(RBC)、平均血小板体积测定(MPV)、平均红细胞血红蛋白浓度(MCH)、收缩压(SBP)、血小板计数(PLT)、镁(MG)、淋巴细胞(LYM)、葡萄糖(GLU,血气分析)、葡萄糖(GLU,血生化)、舒张压(DBP)。结论 Logistic回归及基于决策树的AdaBoost算法和XGBoost算法都能达到预测院内消化道致命性再出血的预警目的,其中XGBboost算法更佳,并能得到12个关键指标。
Objective To propose a method of prediction for fatal gastrointestinal bleeding recurrence in hospital and a method of feature selection via machine learning models. Methods 728 digestive tract hemorrhage samples were extracted from the first aid database of PLA General Hospital, and 343 patients among them were diagnosed as fatal gastrointestinal bleeding recurrence in hospital. A total of 64 physiological or laboratory indicators were extracted and screened. Based on the ten-fold cross-validation, Logistic regression, AdaBoost and XGBoost were used for classification prediction and comparison. XGBoost was used to search sequence features, and the key indicators for predicting fatal gastrointestinal bleeding recurrence in hospital were screened out according to the importance of the indicators during training. Results Logistic regression, AdaBoost and XGBoost all get better F1.5 score under each feature input dimension, among which XGBoost had the best effect and the highest score, which was able to identify as many patients as possible who might have fatal gastrointestinal bleeding recurrence in hospital. Through XGBoost iteration results, the Top 30 indicators with high importance for predicting fatal gastrointestinal bleeding recurrence in hospital were ranked. The F1.5 scores of the first 12 key indicators peaked at iteration (0.893), including hemoglobin (Hb), calcium (CA), red blood cell count (RBC), mean platelet volume (MPV), mean erythrocyte hemoglobin concentration (MCH), systolic blood pressure (SBP), platelet count (PLT), magnesium (MG), lymphocyte (LYM), glucose (GLU, blood gas analysis), glucose (GLU, blood biochemistry) and diastolic blood pressure (DBP). Conclusions Logistic regression, AdaBoost and XGBoost could achieve the purpose of early warning for predicting fatal gastrointestinal bleeding recurrence in hospital, and XGBoost is the most suitable. The 12 most important indicators were screened out by sequential forward selection.
作者
魏子健
李静
李雪岩
赵宇卓
贾立静
黎檀实
Wei Zijian;Li Jing;Li Xueyan;Zhao Yuzhuo;Jia Lijing;Li Tanshi(Department of School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China;Management School, Beijing Union University, Beijing 100101, China;Management School, Beijing Union University, Beijing 100101, China)
出处
《中华危重病急救医学》
CAS
CSCD
北大核心
2019年第3期359-362,共4页
Chinese Critical Care Medicine
基金
国家自然科学基金(81272060)
国家自然科学基金青年科学基金(81701961,71103014)
北京市科技新星计划项目(XX2018019)
解放军总医院医疗大数据科研项目(2017MBD-30).