摘要
目的:探讨医学预测模型构建中连续型变量的分类或特征缩放处理对其内、外部验证效能的影响。方法:采用SEER数据,分别对数据中连续型变量(年龄和肿瘤直径)进行分类或特征缩放处理,然后构建模型,并对其效能进行内部和外部验证。以C统计量为模型的效能指标。结果:连续型变量的分类处理会降低预测模型的内、外部验证中的C统计量,且在多种算法中有相同的表现;而连续型变量的特征缩放处理可以提升模型的外部验证C统计量,却对模型的内部验证中C统计量的值无明显影响,且在多种算法中有相同的表现。结论:连续型变量的不同处理可以影响医学预测模型的内、外部验证效能,建议常规对模型中的连续型变量进行特征缩放处理并筛选最佳的特征缩放方法。
Objective To investigate the effects of the classification or feature scaling of continuous variables on their internal and external validation efficiency in the construction of the medical prediction model.Methods SEER data was used to classify or feature scale the continuous variables(age and tumor size)in data.And then,the model was constructed and its efficiency was verified internally and externally.The efficiency index of the model is based on C statistic.Results The treatment of continuous variables by classification can reduce the C statistic in the internal and external validation of the prediction model,and has the same performance in many algorithms.The feature scaling of continuous variables can increase the external validation C statistic of the model,but it has no significant effect on the internal validation C statistic of the model,and it has the same performance in many algorithms.Conclusion Different treatments of continuous variables can affect the internal and external validation efficiency of the medical prediction models.It is recommended to routinely perform feature scaling on continuous variables in the model and select the best feature scaling methods.
作者
刘岳鹏
杨玉萍
李睿
刘苏
杨艳君
Liu Yuepeng;Yang Yuping;Li Rui;Liu Su;Yang Yanjun(Xuzhou Institute of Medical Sciences,Xuzhou 221006,Jiangsu Province,China;Central Laboratory,Xuzhou Central Hospital;Department of Anesthesiology,the Affiliated Hospital of Xuzhou Medical University)
出处
《中国数字医学》
2022年第6期26-30,共5页
China Digital Medicine
关键词
医学预测模型
连续型变量
特征缩放
分类
Medical prediction model
Continuous variable
Feature scaling
Classification