摘要
目的应用机器学习中的随机森林算法探讨中国女性尿失禁(UI)发病的危险因素,并评价各危险因素对于UI发病的预测效果。方法采用多阶段分层整群抽样,在全国调查55477例成年女性UI情况;基线调查于2014年2月至2016年1月完成,2018年6月至12月电话随访;最终纳入基线无UI且随访UI诊断指标数据完整的对象。采用欠采样技术,按照1∶1的比例从随访时未发生UI的人群中随机抽取与随访对新发UI相等人数作为对照,将这些调查对象的研究数据按照7∶3的比例随机分成训练集和测试集。将单因素分析中P<0.2的候选变量,带入训练集并采用随机森林算法建模,在训练集筛选UI发病的危险因素,根据重要性对危险因素排序,并在测试集中验证。结果共30658例(55.26%,30658/55477)完成随访,中位随访时间3.7年。纳入本研究的24985例基线无UI的对象中,随访调查UI发病人数为1757例(7.03%,1757/24985),其中压力性UI 1117例(4.47%,1117/24985),急迫性UI 243例(0.97%,243/24985),混合性UI 397例(1.59%,397/24985)。随机森林模型固定特征数量为2个、决策树数量为300棵时,平均袋外估计误差率最低,此时模型分类准确率为64.3%,敏感度为64.2%,特异度为64.4%。根据Gini系数平均下降量,得到预测UI发病的前10位影响因素依次为:年龄、分娩次数、分娩方式、体质指数(BMI)、绝经状态、糖尿病史、教育程度、盆腔手术史、城乡分布、婚姻状况。结论应用机器学习中的随机森林算法,从复杂的多因素中识别出预测中国女性UI发病的前10位影响因素,依次为:年龄、分娩次数、分娩方式、BMI、绝经状态、糖尿病史、教育程度、盆腔手术史、城乡分布、婚姻状况。
Objective To explore the risk factors of urinary incontinence(UI)in China by using random forest algorithm,and to evaluate the predictive effect of each risk factor on UI.Methods A baseline survey with a multistage stratified cluster sampling design was conducted between February 2014 and January 2016,and followed up by telephone from June to December 2018.A total of 55477 adult women from six provinces of China participated the survey.According to the ratio of 1:1,under sampling method was used to randomly select the same number of women as UI from the non UI women.The data were randomly divided into training set and verification set according to 7:3.The training set was used to establish the random forest model,which including the candidate variables with P<0.2 in univariate analysis,and the verification set was used to verify the predictive effects.Results A total of 30658 patients(55.26%,30658/55477)completed the follow-up,the median follow-up time was 3.7 years.Among the 24985 women without UI at baseline,1757(7.03%,1757/24985)had UI at followed up,including 1117(4.47%,1117/24985)with stress UI,243(0.97%,243/24985)with urgency UI and 397(1.59%,397/24985)with mixed UI.When fixed the number of features as 2 and the number of random trees as 300 in the random forest model,the out of bag error rate estimation was the lowest;with such parameter settings,the classification accuracy was 64.3%,the sensitivity was 64.2%,and the specificity was 64.4%.The top10 predictive UI factors that screening by the variable importance measure in random forest model were obtained as follows:age,parity,delivery pattern,body mass index(BMI),menopause,history of diabetes,education level,history of pelvic surgery,regions,and marital status.Conclusion We identified the top10 predictive UI factors that screening by the variable importance in random forest model as follows:age,parity,delivery pattern,BMI,menopause,history of diabetes,education level,history of pelvic surgery,regions,and marital status.
作者
庞海玉
朱兰
徐涛
刘青
李兆艾
龚健
王玉玲
汪俊涛
夏志军
郎景和
Pang Haiyu;Zhu Lan;Xu Tao;Liu Qing;Li Zhaoai;Gong Jian;Wang Yuling;Wang Juntao;Xia Zhijun;Lang Jinghe(Medical Research Center,State Key Laboratory of Complex Severe and Rare Diseases,Peking Union Medical College Hospital,Peking Union Medical College,Chinese Academy of Medical Sciences,Beijing 100730,China;Department of Obstetrics and Gynecology,Peking Union Medical College Hospital,Peking Union Medical College,Chinese Academy of Medical Sciences,National Clinical Research Center for Obstetric and Gynecologic Diseases,Beijing 100730,China;Department of Epidemiology and Statistics,Institute of Basic Medical Sciences,Chinese Academy of Medical Sciences and School of Basic Medicine,Peking Union Medical College,Beijing 100005,China;Department of Obstetrics and Gynecology,Maternal and Child Health Hospital of Gansu Province,Lanzhou 730050,China;Department of Obstetrics and Gynecology,Children's Hospital of Shanxi,Women Health Center of Shanxi,Taiyuan 030013,China;Department of Obstetrics and Gynecology,Wuxi Maternal and Child Health Hospital,Nanjing Medical University,Wuxi 214001,China;Department of Obstetrics and Gynecology,Maternal and Child Health Hospital of Foshan of Guangdong Province,Foshan 528000,China;Department of Obstetrics and Gynecology,Maternal and Child Health Hospital of Guiyang,Guiyang 550001,China;Department of Obstetrics and Gynecology,Shengjing Hospital of China Medical University,Shenyang 110004,China)
出处
《中华妇产科杂志》
CAS
CSCD
北大核心
2021年第8期554-560,共7页
Chinese Journal of Obstetrics and Gynecology
基金
国家重点研发计划(2018YFC2002201)
中国医学科学院医学与健康科技创新工程(2017-I2M-1-002)。
关键词
尿失禁
危险因素
随机分配
算法
机器学习
纵向研究
随机森林
Urinary incontinence
Risk factors
Random allocation
Algorithms
Machine learning
Longitudinal study
Random forest