Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is...Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is increasing year by year,so the in-depth study of the pathogenesis of depression,and the development of depression risk prediction model is becoming increasingly important.Method:This study data is derived from the 2017–2018 follow-up data from the National Health and Nutrition Examination Survey database,a publicly available database using a multi-stage,hierarchical,clustered,probability sampling design to determine a nationally representative sample of non-institutionalized US civilians.Participants completed home interviews,laboratory measurements,and a physical examination.Details of the survey design have been published previously.This study evaluated the risk factors for the occurrence of depression from this study from multiple variables such as age,sex,and combined complications.Four machine learning algorithms(logistic regression,Lasso regression,support vector machine,random forest)were used to establish predictive classification models and compare the area under the subject operating feature curve and accuracy.The dataset was validated using a 10-fold cross-validation.Result:We excluded the invalid samples for 815 included samples,of which 570 cases were divided into the validation set and 245 cases were divided into the training set.The area under the curve(AUC)of Nomogram establishing risk of depression based on logistic regression was 0.73.Among the three machine learning models,the Lasso regression-based model AUC was 0.548,a mean AUC for support vector machines was 0.695,and a random forest AUC of 0.613.The support vector machines-based model predicted the best performance compared to other machine models.Conclusion:Random forest-based prediction models are able to assist clinicians in providing decision support when it is difficult to give an exact diagnosis.The model has good clinical utility and facilitates clinicians to identify high-risk patients and perform individualized treatment.The established four models of logistic regression,Lasso regression,support vector machine,and random forest all have good predictive power.展开更多
文摘Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is increasing year by year,so the in-depth study of the pathogenesis of depression,and the development of depression risk prediction model is becoming increasingly important.Method:This study data is derived from the 2017–2018 follow-up data from the National Health and Nutrition Examination Survey database,a publicly available database using a multi-stage,hierarchical,clustered,probability sampling design to determine a nationally representative sample of non-institutionalized US civilians.Participants completed home interviews,laboratory measurements,and a physical examination.Details of the survey design have been published previously.This study evaluated the risk factors for the occurrence of depression from this study from multiple variables such as age,sex,and combined complications.Four machine learning algorithms(logistic regression,Lasso regression,support vector machine,random forest)were used to establish predictive classification models and compare the area under the subject operating feature curve and accuracy.The dataset was validated using a 10-fold cross-validation.Result:We excluded the invalid samples for 815 included samples,of which 570 cases were divided into the validation set and 245 cases were divided into the training set.The area under the curve(AUC)of Nomogram establishing risk of depression based on logistic regression was 0.73.Among the three machine learning models,the Lasso regression-based model AUC was 0.548,a mean AUC for support vector machines was 0.695,and a random forest AUC of 0.613.The support vector machines-based model predicted the best performance compared to other machine models.Conclusion:Random forest-based prediction models are able to assist clinicians in providing decision support when it is difficult to give an exact diagnosis.The model has good clinical utility and facilitates clinicians to identify high-risk patients and perform individualized treatment.The established four models of logistic regression,Lasso regression,support vector machine,and random forest all have good predictive power.