摘要
肺癌是我国恶性肿瘤当中发病率第一,死亡率第一的恶性肿瘤。目前我国肺癌患者的五年平均生存率仍不足20%,对于肺癌患者的诊疗效果仍有待提升。所以有必要进一步研究肺癌患者的预后影响因素,建立预后预测模型,预测患者预后风险及生存情况,可以帮助临床医生更好地判断患者的预后情况,并发现新的疾病相关因素。本文基于机器学习算法以及TCGA数据库中肺癌患者的多组学数据,以探究肺癌患者生存期是否超过五年为目标,利用加权共表达网络算法找到影响肺癌患者预后生存的关键特征基因。通过结合分类算法和加权共表达网络算法来构建预后预测模型,并使用AUC值对模型的分类效果进行评估,最终得到使用基于KNN回归构建的预后模型效果最好,能够较为准确地判断肺癌患者生存时间是否超过5年。
Lung cancer is the malignant tumor with the highest incidence and the highest mortality rate in my country. At present, the five-year average survival rate of lung cancer patients in my country is still less than 20%, and the diagnosis and treatment of lung cancer patients still need to be improved. Therefore, it is necessary to further study the prognostic factors of lung cancer patients, and estab-lish a prognostic prediction model to predict the prognostic risk and survival of patients, which can help clinicians better judge the prognosis of patients and discover new disease-related factors. Based on the machine learning algorithm and the multi-omics data of lung cancer patients in the TCGA database, this paper aims to explore whether the survival time of lung cancer patients ex-ceeds five years, and uses the weighted co-expression network algorithm to find the key feature genes that affect the prognosis and survival of lung cancer patients. By combining the classification algorithm and the weighted co-expression network algorithm, the prognosis prediction model is constructed, and the AUC value is used to evaluate the classification effect of the model. Finally, the prognostic model constructed based on KNN regression has the best effect and can more accurately judge the survival of lung cancer patients whether the time is more than 5 years.
出处
《应用数学进展》
2022年第6期4022-4031,共10页
Advances in Applied Mathematics