In this paper, using the personal credit loan default data of Zhongyuan Bank provided by the CCF competition, the data cleaning and feature engineering was carried out and the initial 38 features were reduced to 18 features. Then the important factors affecting the bank personal credit risk were explored by combining the 5C theory and expected income theory, and the top five factors ranked by feature importance were: total credit working balance, loan disbursement date accord-ing to the initial date days, borrower’s average loan score, current loan interest rate and anonymous variable f0. In order to improve the accuracy of bank personal credit risk assessment, this paper compared three methods of processing unbalanced data, SMOTE, random under sampling and SMOTEENN, based on the random forest model, and SMOTEENN combined sampling had the best effect;then a total of four machine learning models, decision tree, random forest, AdaBoost and LightGBM, were established and it’s showed that LightGBM had the highest accuracy rate after bal-ancing, reaching 96.1%.
Modeling and Simulation