摘要
目的通过机器学习算法筛选围绝经期妇女发生围绝经期综合征(perimenopausal syndrome,PMS)的影响因素,构建围绝经期妇女中重度PMS发生风险的预测模型。方法选取上海市浦东新区48个社区的围绝经期妇女作为研究对象,并根据Kupperman评分量表评价结果分为PMS正常或轻度组和中重度组。将数据随机分为训练集和测试集,使用Boruta算法和SHAP算法进行特征选择。使用逻辑回归(Logistic regression,LR)、随机森林(random forest,RF)、支持向量机(support vector machine,SVM)和梯度提升决策树(gradient boosting decision tree,GBDT)构建模型,并采用精确度、准确度、召回率、受试者工作特征曲线的曲线下面积(area under curve,AUC)、F1分数评估模型性能。结果共纳入围绝经期妇女856例,其中PMS正常或轻度组557例、中重度组299例;训练集599例、验证集257例。使用Boruta算法以及SHAP算法纳入在职情况、运动、年龄、月经情况、就诊史、超重、居住地、健康教育史、户籍9个特征作为最终模型的预测变量。参数调优后,LR、RF、SVM、GBDT算法基于训练集的10折交叉验证的AUC分别为0.64、0.77、0.74、0.77。LR、RF、SVM、GBDT算法在测试集中的AUC分别为0.63、0.69、0.69、0.73;召回率分别为0.59、0.55、0.55、0.62。结论构建的围绝经期妇女中重度PMS发生风险预测模型中,GBDT算法的预测表现最优,具有一定的预测价值,可为围绝经期妇女发生中重度PMS的早期识别和干预提供新思路和方法。
Objective To identify risk factors for perimenopausal syndrome(PMS)among perimenopausal women using machine learning algorithms,and to construct a predictive model for the risk of developing moderate to severe PMS in perimenopausal women.Methods Perimenopausal women from 48 communites in Pudong New Area,Shanghai were selected as the study subjects.Based on the Kupperman index,participants were divided into the normal or mild PMS group and the moderate to severe PMS group.The data was randomly divided into training set and testing set,and feature selection was performed using the Boruta algorithm and SHAP algorithm.Logistic regression(LR),random forest(RF),support vector machine(SVM),and gradient boosting decision tree(GBDT)were constructed,and model performances were evaluated using accuracy,precision,recall,area under curve(AUC)of the receiver operating characteristic curve,and F1-score.Results A total of 856 perimenopausal women were included in the study,of which 557 were in the normal or mild PMS group and 299 were in the moderate to severe PMS group;599 were in the training set and 257 were in the testing set.9 features(employment status,exercise,age,menstrual condition,medical history,obesity,residence area,history of health education,household register)were selected as predictors for the final model using the Boruta algorithm and SHAP analysis.After parameter tuning,the 10-fold cross-validation AUC of LR,RF,SVM,and GBDT models based on the training set were 0.64,0.77,0.74,and 0.77,respectively.The AUC of the LR,RF,SVM,and GBDT models based on the testing set were 0.63,0.69,0.69,and 0.73,respectively,with recall rates of 0.59,0.55,0.55,and 0.62.Conclusion Among the constructed predictive models for the risk of developing moderate to severe PMS in perimenopausal women,the GBDT model demonstrated the best predictive performance and has potential clinical value.This study provides a new approach for the early identification and intervention of moderate to severe PMS in perimenopausal women.
作者
张敏
顾婷婷
关蔚
刘想想
施君瑶
ZHANG Min;GU Tingting;GUAN Wei;LIU Xiangxiang;SHI Junyao(Department of Women’s Health Care,Maternal&Child Health Care Institute of Pudong New Area,Shanghai 201206,China)
出处
《医学新知》
CAS
2024年第8期871-879,共9页
New Medicine
基金
上海市浦东新区卫生健康委员会面上项目(PW2020A-79)
上海市浦东新区第二轮医学学科建设项目(PWYgts2021-02)。
关键词
围绝经期综合征
围绝经期妇女
机器学习
预测
Perimenopausal syndrome
Perimenopausal women
Machine learning
Prediction