中心聚集效应下多种预测模型构建策略的模拟比较被引量：1

Simulation comparison of various prediction model construction strategies under clustering effect

导出

摘要目的利用多中心数据构建临床预测模型时,数据的独立性假设会发生违背,研究对象之间存在明显中心聚集效应,为了充分考虑聚集性问题,本研究拟比较考虑中心聚集效应的随机截距Logistic回归模型(RI)和固定效应模型(FEM)与不考虑中心聚集效应的标准Logistic回归模型(SLR)和随机森林算法(RF)在不同场景下的模型性能。方法模拟预测模型建立过程中,存在不同程度中心聚集效应时,在中心水平上不同模型的预测性能,包括在不同场景中的区分度和校准度差异,同时比较这种差异在不同事件率时的变化趋势。结果在中心水平,不同模型(除RF外)在中心聚集效应下不同场景的区分度差异不大,其C-index均值变化很小。利用多中心高度聚集的数据进行预测时,边缘预测(M.RI、SLR和RF)与条件预测相比校准截距略小于0,高估了预测的平均概率。其中RF则在多中心大样本条件下截距校准表现很好,这也体现了机器学习算法对处理大样本数据的优势。在中心多患者少时,FEM进行条件预测,校准截距大于0,预测的平均概率被低估。此外,在利用多中心大样本数据开发预测模型时,三个条件预测(FEM、A.RI、C.RI)斜率校准较好,边缘预测(M.RI和SLR)的校准斜率大于1出现了欠拟合的问题,且随着中心聚集效应增加欠拟合问题越发凸显。特别是在中心少患者少时,数据的过拟合会掩盖边缘预测与条件预测校准性能上的差异。最后,越低的事件发生率时,中心聚集效应在中心水平对不同模型预测性能的影响越明显。结论利用高度聚集的多中心数据构建模型并应用于特定环境中预测,当中心数较少或因不同发病率导致中心间差异较大时可以选择RI和FEM进行条件预测;当中心数较多、样本量较大时可选择RI进行条件预测或RF进行边缘预测。 Objective When using multi-center data to construct clinical prediction models,the independence assumption of data will be violated,and there is an obvious clustering effect among research objects.In order to fully consider the clustering effect,this study intends to compare the model performance of the random intercept logistic regression model(RI)and the fixed effects model(FEM)considering the clustering effect with the standard logistic regression model(SLR)and the random forest algorithm(RF)without considering the clustering effect under different scenarios.Methods In the process of forecasting model establishment,the prediction performance of different models at the center level was simulated when there were different degrees of clustering effects,including the difference of discrimination and calibration in different scenarios,and the change trend of this difference at different event rates was compared.Results At the center level,different models,except RF,showed little difference in the discrimination of different scenarios under the clustering effect,and the mean of their C-index changed very little.When using multi-center highly clustered data for forecasting,the marginal forecasts(M.RI,SLR and RF)had calibrated intercepts slightly less than 0 compared with the conditional forecasts,which overestimated the average probability of prediction.RF performed well in intercept calibration under the condition of multi-center and large samples,which also reflected the advantage of machine learning algorithm for processing large sample data.When there were few multiple patients in the center,the FEM made conditional predictions,the calibrated intercept was greater than 0,and the predicted mean probability was underestimated.In addition,when the multi-center large sample data were used to develop the prediction model,the slopes of the three conditional forecasts(FEM,A.RI,C.RI)were well calibrated,while the calibrated slopes of the marginal forecasts(M.RI and SLR)were greater than 1,which led to the problem of underfitting,and the underfitting problem became more prominent with the increase in the central aggregation effect.In particular,when there were few centers and few patients,overfitting of the data could mask the difference in calibration performance between marginal and conditional forecasts.Finally,the lower the event rate the central clustering effect at the central level had a more pronounced impact on the forecasting performance of the different models.Conclusion The highly clustered multicenter data are used to construct the model and apply it to the prediction in a specific environment.RI and FEM can be selected for conditional prediction when the number of centers is small or the difference between centers is large due to different incidence rates.When the number of hearts is large and the sample size is large,RI can be selected for conditional prediction or RF for edge prediction.

作者于建彭驰金志超 YU Jian;PENG Chi;JIN Zhichao(Department of Health Statistics,Naval Medical University,Shanghai 200433,P.R.China)

机构地区海军军医大学卫生统计学教研室

出处《中国循证医学杂志》 CSCD 北大核心 2023年第7期834-842,共9页 Chinese Journal of Evidence-based Medicine

基金海军军医大学“三航”项目上海市公共卫生体系建设三年行动计划学科建设项目(编号:GWV-10.1-XK05)。

关键词中心聚集效应临床预测模型区分度校准度模拟研究异质性 Clustering effect Clinical prediction model Discrimination Calibration Simulation study Heterogeneity

分类号 R195.1 [医药卫生—卫生统计学]

引文网络
相关文献

同被引文献17

1李杼红.新媒体时代下健康传播的机遇与挑战[J].新闻传播,2022(2):19-20. 被引量：15
2曹俊景,李玲杰,王小丽,关东升.微信健康管理平台对脑卒中患者自我管理能力及生活质量的影响[J].医药论坛杂志,2021,42(18):1-4. 被引量：5
3马孝先.中国城镇化的关键影响因素及其效应分析[J].中国人口·资源与环境,2014,24(12):117-124. 被引量：113
4丁梅,陈玲玲,赵红,程矿兰.可视化健康教育在首发脑卒中病人认知及自我管理行为中的应用[J].蚌埠医学院学报,2018,43(8):1094-1096. 被引量：15
5赵建军,邢茂迎,徐栋,张延丽,许劲松.医疗纠纷的三级预防体系[J].解放军医院管理杂志,2019,26(12):1114-1116. 被引量：3
6姜晓源.基于5W模式审视新媒体时代的健康传播[J].新闻传播,2022(1):26-28. 被引量：4
7黎定兰,刁礼娟.基于“知信行”模式下的健康管理对脑卒中患者自我管理能力及再住院率的影响分析[J].黑龙江医学,2022,46(6):678-679. 被引量：3
8张敬华,李建香,于广亚,赵杨,唐煜明.基层医院脑血管病三级预防现状与改革措施[J].中医药管理杂志,2022,30(21):197-199. 被引量：1
9张文龙,孟德侠,王仪.知信行模式在慢性病病人中的应用现状[J].大众科技,2022,24(11):97-100. 被引量：1
10张梅,方菁.手机自媒体健康传播研究[J].医学信息学杂志,2023,44(5):51-54. 被引量：1

引证文献1

1王洪兴,张鹏,樊爱青.基于自媒体传播的社区脑卒中患者自我健康管理思考[J].健康教育与健康促进,2024,19(4):423-425.

1张洋洋,顾珣可,王永清,贾珂珂.子痫前期预测模型的研究进展[J].临床检验杂志,2023,41(4):269-273. 被引量：1
2王伟,李东.基于数字孪生的数据中心一体化智能管理系统研究[J].中国新通信,2023,25(7):47-49.
3陈青雁,宋金奇.2011—2022年财富中国500强企业时空分布演化特征[J].当代经济,2023,40(7):97-103.
4曾茜,韩华,李秋晖,李巧丽.基于分包的混合朴素贝叶斯链路预测模型[J].复杂系统与复杂性科学,2023,20(2):10-19.
5于健.浅谈财务共享服务中心数智化发展路径[J].中文科技期刊数据库（全文版）经济管理,2023(7):148-150.
6王娜,赵雁楠.自由环境下的信息延滞群集行为研究[J].忻州师范学院学报,2023,39(2):7-11.
7杨青,苑春荟.政府资本竞争影响投资的时空效应及变动趋势[J].投资研究,2023,42(4):4-21.
8叶坤佩,熊熙,丁哲.基于领域融合和时间权重的招工推荐模型[J].计算机应用,2023,43(7):2133-2139. 被引量：1
9邱珍珠,徐晓岭,顾蓓青.Maxwell分布的图像特征及统计分析[J].电子产品可靠性与环境试验,2023,41(3):27-34.

中国循证医学杂志

2023年第7期

浏览历史

内容加载中请稍等...

中心聚集效应下多种预测模型构建策略的模拟比较被引量：1

同被引文献17

引证文献1

相关作者

相关机构

相关主题

浏览历史

中心聚集效应下多种预测模型构建策略的模拟比较 被引量：1

同被引文献17

引证文献1

相关作者

相关机构

相关主题

浏览历史

中心聚集效应下多种预测模型构建策略的模拟比较被引量：1