Cardiovascular disease(CVD)has gradually become one of the main causes of harm to the life and health of residents.Exploring the influencing factors and risk assessment methods of CVD has become a general trend.In thi...Cardiovascular disease(CVD)has gradually become one of the main causes of harm to the life and health of residents.Exploring the influencing factors and risk assessment methods of CVD has become a general trend.In this paper,a machine learning-based decision-making mechanism for risk assessment of CVD is designed.In this mechanism,the logistics regression analysismethod and factor analysismodel are used to select age,obesity degree,blood pressure,blood fat,blood sugar,smoking status,drinking status,and exercise status as the main pathogenic factors of CVD,and an index systemof risk assessment for CVD is established.Then,a two-stage model combining K-means cluster analysis and random forest(RF)is proposed to evaluate and predict the risk of CVD,and the predicted results are compared with the methods of Bayesian discrimination,K-means cluster analysis and RF.The results show that thepredictioneffect of theproposedtwo-stagemodel is better than that of the comparedmethods.Moreover,several suggestions for the government,the medical industry and the public are provided based on the research results.展开更多
Stroke is a chronic cerebrovascular disease that carries a high risk.Stroke risk assessment is of great significance in preventing,reversing and reducing the spread and the health hazards caused by stroke.Aiming to ob...Stroke is a chronic cerebrovascular disease that carries a high risk.Stroke risk assessment is of great significance in preventing,reversing and reducing the spread and the health hazards caused by stroke.Aiming to objectively predict and identify strokes,this paper proposes a new stroke risk assessment decision-making model named Logistic-AdaBoost(Logistic-AB)based on machine learning.First,the categorical boosting(CatBoost)method is used to perform feature selection for all features of stroke,and 8 main features are selected to form a new index evaluation system to predict the risk of stroke.Second,the borderline synthetic minority oversampling technique(SMOTE)algorithm is applied to transform the unbalanced stroke dataset into a balanced dataset.Finally,the stroke risk assessment decision-makingmodel Logistic-AB is constructed,and the overall prediction performance of this new model is evaluated by comparing it with ten other similar models.The comparison results show that the new model proposed in this paper performs better than the two single algorithms(logistic regression and AdaBoost)on the four indicators of recall,precision,F1 score,and accuracy,and the overall performance of the proposed model is better than that of common machine learning algorithms.The Logistic-AB model presented in this paper can more accurately predict patients’stroke risk.展开更多
As the banking industry gradually steps into the digital era of Bank 4.0,business competition is becoming increasingly fierce,and banks are also facing the problem of massive customer churn.To better maintain their cu...As the banking industry gradually steps into the digital era of Bank 4.0,business competition is becoming increasingly fierce,and banks are also facing the problem of massive customer churn.To better maintain their customer resources,it is crucial for banks to accurately predict customers with a tendency to churn.Aiming at the typical binary classification problem like customer churn,this paper establishes an early-warning model for credit card customer churn.That is a dual search algorithm named GSAIBAS by incorporating Golden Sine Algorithm(GSA)and an Improved Beetle Antennae Search(IBAS)is proposed to optimize the parameters of the CatBoost algorithm,which forms the GSAIBAS-CatBoost model.Especially,considering that the BAS algorithm has simple parameters and is easy to fall into local optimum,the Sigmoid nonlinear convergence factor and the lane flight equation are introduced to adjust the fixed step size of beetle.Then this improved BAS algorithm with variable step size is fused with the GSA to form a GSAIBAS algorithm which can achieve dual optimization.Moreover,an empirical analysis is made according to the data set of credit card customers from Analyttica official platform.The empirical results show that the values of Area Under Curve(AUC)and recall of the proposedmodel in this paper reach 96.15%and 95.56%,respectively,which are significantly better than the other 9 common machine learning models.Compared with several existing optimization algorithms,GSAIBAS algorithm has higher precision in the parameter optimization for CatBoost.Combined with two other customer churn data sets on Kaggle data platform,it is further verified that the model proposed in this paper is also valid and feasible.展开更多
基金This work is supported by the National Natural Science Foundation of China(Nos.72071150,71871174).
文摘Cardiovascular disease(CVD)has gradually become one of the main causes of harm to the life and health of residents.Exploring the influencing factors and risk assessment methods of CVD has become a general trend.In this paper,a machine learning-based decision-making mechanism for risk assessment of CVD is designed.In this mechanism,the logistics regression analysismethod and factor analysismodel are used to select age,obesity degree,blood pressure,blood fat,blood sugar,smoking status,drinking status,and exercise status as the main pathogenic factors of CVD,and an index systemof risk assessment for CVD is established.Then,a two-stage model combining K-means cluster analysis and random forest(RF)is proposed to evaluate and predict the risk of CVD,and the predicted results are compared with the methods of Bayesian discrimination,K-means cluster analysis and RF.The results show that thepredictioneffect of theproposedtwo-stagemodel is better than that of the comparedmethods.Moreover,several suggestions for the government,the medical industry and the public are provided based on the research results.
基金supported by the National Natural Science Foundation of China (No.72071150).
文摘Stroke is a chronic cerebrovascular disease that carries a high risk.Stroke risk assessment is of great significance in preventing,reversing and reducing the spread and the health hazards caused by stroke.Aiming to objectively predict and identify strokes,this paper proposes a new stroke risk assessment decision-making model named Logistic-AdaBoost(Logistic-AB)based on machine learning.First,the categorical boosting(CatBoost)method is used to perform feature selection for all features of stroke,and 8 main features are selected to form a new index evaluation system to predict the risk of stroke.Second,the borderline synthetic minority oversampling technique(SMOTE)algorithm is applied to transform the unbalanced stroke dataset into a balanced dataset.Finally,the stroke risk assessment decision-makingmodel Logistic-AB is constructed,and the overall prediction performance of this new model is evaluated by comparing it with ten other similar models.The comparison results show that the new model proposed in this paper performs better than the two single algorithms(logistic regression and AdaBoost)on the four indicators of recall,precision,F1 score,and accuracy,and the overall performance of the proposed model is better than that of common machine learning algorithms.The Logistic-AB model presented in this paper can more accurately predict patients’stroke risk.
基金This work is supported by the National Natural Science Foundation of China(Nos.72071150,71871174).
文摘As the banking industry gradually steps into the digital era of Bank 4.0,business competition is becoming increasingly fierce,and banks are also facing the problem of massive customer churn.To better maintain their customer resources,it is crucial for banks to accurately predict customers with a tendency to churn.Aiming at the typical binary classification problem like customer churn,this paper establishes an early-warning model for credit card customer churn.That is a dual search algorithm named GSAIBAS by incorporating Golden Sine Algorithm(GSA)and an Improved Beetle Antennae Search(IBAS)is proposed to optimize the parameters of the CatBoost algorithm,which forms the GSAIBAS-CatBoost model.Especially,considering that the BAS algorithm has simple parameters and is easy to fall into local optimum,the Sigmoid nonlinear convergence factor and the lane flight equation are introduced to adjust the fixed step size of beetle.Then this improved BAS algorithm with variable step size is fused with the GSA to form a GSAIBAS algorithm which can achieve dual optimization.Moreover,an empirical analysis is made according to the data set of credit card customers from Analyttica official platform.The empirical results show that the values of Area Under Curve(AUC)and recall of the proposedmodel in this paper reach 96.15%and 95.56%,respectively,which are significantly better than the other 9 common machine learning models.Compared with several existing optimization algorithms,GSAIBAS algorithm has higher precision in the parameter optimization for CatBoost.Combined with two other customer churn data sets on Kaggle data platform,it is further verified that the model proposed in this paper is also valid and feasible.