摘要
Cataract is a very common eye disease and the most significant cause of blindness.In consideration of its burden on society,the focus was put on testing the risk factors of cataract and building robust machine learning models in which these factors can be utilized to predict the risk of cataract.The data used herein was collected by a Chinese physical examination center located in Shanghai.It contains more than 120,000 examinees and about 500 physical examination metrics.Firstly,association rules were adopted to filter 39 abnormalities which are more likely to incur the risk of cataract,and the significance of these abnormalities was tested with univariate analysis and multivariate analysis.The test results indicate that age,diabetes,refractive error,retinal arteriosclerosis,thyroid nodules,and incomplete mammary gland degeneration significantly increase the possibility of cataract.Various machine learning models were compared in terms of their performance in predicting the risk of cataract based on these six factors,among which the logistic regression model and the decision-tree based ensemble methods outperform others.The test set A U C of these models can reach 0.84.
基金
the National Key R&D Program of China under Grant No.2020AAA0103800.