摘要
为辅助医生进行早期的肺癌预测,提出用遗传算法(GA)对集成算法XGBoost进行优化的GA-XGBoost预测方法。针对机器学习存在的样本数量小、数据质量不佳等问题,提出结合SMOTE过采样、随机森林特征重要性排序构建最终肺癌预测模型,进行肺癌的预测分类。对数据集进行测试,结果表明:与K最近邻、SVM、决策树、XGBoost算法相比,该模型准确率93.2%,同时具有更快的响应速度,综合性能最优。
In order to assist doctors in early prediction of lung cancer,a GA-XGBoost prediction method optimized by genetic algorithm(GA)on integrated algorithm XGBoost is proposed.In view of the problems existing in machine learning such as small sample quantity and poor data quality,a final lung cancer prediction model is proposed by combining SMOTE oversampling and random forest feature importance ranking to predict and classify lung cancer.The dataset is tested and the results show that compared with the K-nearest neighbor,SVM,decision tree and XGBoost algorithm,the proposed model has the best comprehensive performance with the accuracy of 93.2%and faster response speed.
作者
柯东
晏峻峰
Ke Dong;Yan Junfeng(School of Information Science and Engineering,Hunan University of Chinese Medicine,Changsha,Hunan 410208,China)
出处
《计算机时代》
2023年第11期131-135,140,共6页
Computer Era
基金
湖南省教育厅重点项目“具有模糊不确定性的危急重症中医诊疗知识表示与融合研究”(21A0250)。