摘要
开发一种机器学习模型,用于预测个体是否处于受COVID-19(COrona VIrus Disease 2019)影响的危险之中,并辅助医疗决策,包括就医或选择居家隔离。基于GradientBoost、XGBoost、随机森林3种集成学习算法以及决策树、逻辑回归、支持向量机、KNN算法4种非集成学习算法,构建COVID-19风险预测模型并验证模型效能,识别COVID-19风险因素。集成学习与非集成学习模型ROC曲线下面积大致都在0.94左右。同时识别出年龄、是否是住院患者、是否携带病毒、是否怀孕、是否患有肺炎、是否插氧等重要风险因素。结果表明,在大样本量下集成学习不一定会优于非集成学习方法。
To develop a machine learning model to predict whether an individual is at risk from COVID-19(Corona Virus Disease 2019)and to aid medical decisions,including seeking medical attention or choosing home isolation.Based on three integrated learning algorithms,GradientBoost,XGBoost and Stochastic Forest,as well as four non-integrated learning algorithms including decision tree,logistic regression,support vector machine and KNN(K-Nearest Neighbor)algorithm were used to construct a COVID-19 risk prediction model.We validated the model efficiency,and identified the COVID-19 risk factors.The area under ROC(Receiver Operating Characteristic)curve of both integrated learning and non-integrated learning models was approximately 0.94.Important risk factors,such as age,hospitalization,infection,pregnancy,pneumonia,and oxygen insertion,were also identified.Integrated learning is not necessarily superior to non-integrated learning in large sample size.
作者
国威
陈广新
于淼
于广浩
郭金兴
GUO Wei;CHEN Guang-xin;YU Miao;YU Guang-hao;GUO Jin-xing(School of Life Sciences,Mudanjiang Medical College,Mudanjiang,Heilongjiang 157011,China;School of Medical Imaging,Mudanjiang Medical College,Mudanjiang,Heilongjiang 157011,China;Red Flag Hospital Affiliated to Mudanjiang Medical College,Mudanjiang,Heilongjiang 157011,China)
出处
《新一代信息技术》
2023年第13期12-17,共6页
New Generation of Information Technology
基金
2021年黑龙江省省属高校基本科研业务费科研项目(No.2021-KYYWF-0494)
黑龙江省卫生健康委员会科技(资助)项目(No.20220909040626)
关键词
新冠肺炎
COVID-19
机器学习
医疗资源分配
预测模型
novel coronavirus pneumonia
COVID-19
machine learning
allocation of medical resources
prediction model