摘要
目的利用机器学习方法预测非小细胞肺癌(NSCLC)患者的5年生存状况,提高预测效率与预测准确性.方法采用SEER数据库的NSCLC数据进行实验.针对患者数据存在的不平衡问题,使用Borderline-SMOTE法进行数据采样,采用基于扰动理论的特征选择(PFS)方法和决策树(DT)算法筛选特征并构建患者术后生存预测模型.结果平衡后的数据集纳入了年龄、组织学分级、种族、发病部位、肿瘤分期、病理类型、手术类型共7项预后相关变量.与LASSO、Tree-based、PFS-SVM和PFS-kNN模型相比,使用PFS-DT构建的模型具有最优的预测效果.结论基于PFS-DT的患者生存预测模型有效提高了NSCLC患者术后生存预测的准确率,可为医生提供治疗和改善预后方面的参考.
Objective To predict the 5-year survival of patients with non-small cell lung cancer (NSCLC) by machine learning, and to improve the prediction efficiency and prediction accuracy. Methods The experiments were performed using NSCLC data from the SEER database. According to the imbalance of patient data, the Borderline-SMOTE method was used for data sampling. The perturbation-based feature selection (PFS) method and decision tree ( DT ) algorithm were used to screen the features and construct the postoperative survival prediction model . Results The patient data was balanced, and seven prognostic variables were screened, including primary site, stage group, surgical primary site, international classification of diseases, race and grade. Compared with LASSO, Tree-based, PFS-SVM and PFS-kNN models, the model constructed using PFS-DT has the best predictive effect. Conclusions The patient survival prediction model based on PFS-DT can effectively improve the accuracy of postoperative survival prediction in patients with NSCLC, and can provide a reference for doctors to provide treatment and improve prognosis.
作者
赵阳
汪晓洁
马磊
邵党国
相艳
熊馨
张力
Zhao Yang;Wang Xiaojie;Ma Lei;Shao Dangguo;Xiang Yan;Xiong Xin;Zhang Li(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming650500,China;Department of Medical Oncology,the First People's Hospital of Yunnan Province,Kunming650032,China)
出处
《国际生物医学工程杂志》
CAS
2019年第4期336-341,共6页
International Journal of Biomedical Engineering
基金
国家自然科学基金(81760022)
国家博士后科学基金(2016M592894XB)
云南省重大科技专项(2018ZF017).
关键词
非小细胞肺癌
不平衡
特征选择
生存预测
Non-small cell lung cancer
Imbalance
Feature selection
Survival prediction