摘要
目的集成学习是近年来机器学习领域中被广泛应用的一种新的、用来提高学习精度的算法。本文旨在介绍基于super learner算法的集成学习方法在纵向删失数据预测建模中的应用及其R语言实现。方法本文介绍了super learner算法的基本原理及其在纵向删失数据建模中的应用,以及如何在R语言中实现该算法的建模。其次,应用TCGA数据库中的肿瘤生存数据进行实例分析,展示其在实际数据分析中的应用效果。结果基于super learner算法的集成学习方法在建模时,模型参数估计方法的选择和算法参数的定义均较为灵活。在实际数据分析中,super learner算法可以充分利用所获得的数据建立模型,模型的预测准确度为0.8737(95%CI:0.7897~0.9330),C-index为0.883,预测准确性较高。结论基于super learner算法的集成学习方法为纵向删失数据的预测建模分析提供了新的选择。
Objective Ensemble learning is a novel approach to improving learning accuracy in machine learning field recently.This paper aims to introduce the application of ensemble learning method based on super learner algorithm in the prediction modeling of longitudinal censored data and its implementation of R language.Methods This paper introduced the principle in modeling longitudinal censored data based on super learner algorithm and its implementation method with R-programming language.In addition,tumor survival data from TCGA database were used for real data analysis to illustrate its performance in practice.Results The estimation methods for model parameters and definition of ensemble learning parameters based on super learner algorithm are more flexible.In actual data analysis,super learner algorithm can make full use of the obtained data to establish the prediction model.The prediction accuracy of the model is 0.8737(95%CI:0.7897-0.9330)and the C-index is 0.883,so the prediction performance is good.Conclusion The ensemble learning approach with super learner algorithm provides a new choice for the prediction analysis based on longitudinal censored data.
作者
杨嵛惠
王静娴
赵芃
李业棉
陈方尧
Yang Yuhui;Wang Jingxian;Zhao Peng;Li Yemian;Chen Fangyao(Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an 710061, China)
出处
《中国医院统计》
2021年第1期86-90,共5页
Chinese Journal of Hospital Statistics
基金
国家自然科学基金(81703325)。