摘要
文中提出用于高血压菜谱识别的基于遗传算法的改进XGBoost(eXtreme Gradient Boosting)模型。该模型主要包括3个步骤:首先,对数据集进行预处理,包括缺失值补全、数据去重和特征分析;然后,使用遗传算法自适应地优化XGBoost模型参数;最后,根据最优参数训练高血压菜谱识别模型,并将其应用于高血压菜谱识别。结果表明,在高血压菜谱识别效果方面,采用遗传算法优化的参数优于网格搜索所得到的参数。此外,所提出的基于遗传算法的改进XGBoost模型在精度、回召率、F1值和AUC评估指标方面具有不错的表现,优于其他4种(随机森林、GBDT、Bagging和AdaBooster)组合分类模型,且提高了菜谱识别模型的可解释性。
A novel improved XGBoost(eXtreme Gradient Boosting)model based on genetic algorithm for hypertension recipe recognition was proposed.The model consists of three steps.Firstly,data pre-processing is employed to handle missing values,remove duplicate data and analyze data feature.Then,the genetic algorithm is used to optimize the parameters of XGBoost model adaptively.At last,hypertension recipe identification model is trained according to the optimal parameters.The results show that the parameters optimized by genetic algorithm performs better than grid search.Moreover,the proposed model outperforms other four models(Random forest,GBDT,Bagging and AdaBooster)over four evaluation measures:accuracy,recall rate,F1 and the area under the curve(AUC)on average,and enhances the interpretability of credit scoring model.
作者
雷雪梅
谢依彤
LEI Xue- mei, XIE Yi- tong(School of Computer and Communication Engineering,University of Science and Technology Beijing, Beijing 100083, Chin)
出处
《计算机科学》
CSCD
北大核心
2018年第B06期476-481,共6页
Computer Science