摘要
目的应用综合机器学习(ML)模型开发新的精确生物标志物,以预测胰腺癌(PC)患者预后及药物敏感性。方法从公共多中心队列中获取768例患者的数据,并从CTRP-V2.0数据库中检索PC细胞的吉西他滨耐药数据。采用由9种ML算法组成的95个ML模型创建吉西他滨耐药相关基因特征(GRGS)。根据GRGS,PC患者被分为高危(共385例)和低危(共383例)两组。通过R 4.1.3软件的不同软件包,在纳入队列中评估GRGS预测PC患者总生存率(OS)的风险比(HR);并将CRGS与已发表基因标签进行比较。运用AutodockVina 1.2.2软件筛选药物与GRGS核心基因进行分子对接,记录药物与核心基因的结合能评分。单因素多因素分析均采用cox回归模型,Kaplan-Meier法绘制生存曲线,HR评估两组个体之间生存时间的差异,t检验比较连续性变量,c指数值(c-index)用来比较模型和基因标签效能。结果在95个模型中,最小绝对收缩和选择操作+随机生存森林模型(LASSO+RSF)开发的GRGS显示出最高的c-index,在5个队列中的平均值为0.674。在癌症基因组图谱(TCGA)中,CRGS分数较高的PDAC患者,其OS越差[HR=7.40,95%可信区间(CI):4.42~12.41,P<0.01],相较于Grade分级、TNM分期、性别和年龄,CRGS是PADC患者较差OS的独立风险因素[HR=1.07,95%CI:1.05~1.08,P<0.01]。与已发表的43个基因标签比较,GRGS在评估多个队列的预后方面表现更佳(c-index=0.94)。SMC4是组成GMGS的九个基因之一,达沙替尼结合SMC4蛋白分子的效能为-0.246。结论GRGS可作为评估PC患者临床预后和全身治疗的有前途的生物标志物。
Objective The objective of this study was to develop new precise biomarkers using an integrated machine learning(ML)model to predict prognosis and drug sensitivity in patients with pancreatic cancer(PC).Methods The data of 768 patients were obtained from the public multicenter cohort,and the data of gemcitabine resistance of PC cells were retrieved from the CTRP-V2.0 database.Gemcitabine resistance related gene profiles(GRGS)were created using 95 ML models composed of 9 ML algorithms.According to GRGS,patients with PC were divided into high-risk(385 cases)and low-risk(383 cases)groups.Using different packages of R 4.1.3 software,GRGS were evaluated in the inclusion cohort for predicting the hazard ratio(HR)of overall survival(OS in PC patients;CRGS were compared with published gene tags.AutodockVina 1.2.2 software was used to screen drugs for molecular docking with GRGS core genes,and the binding energy score of drugs and core genes was recorded.In univariate and multivariate analyses,cox regression model was used,Kaplan-Meier method was used to draw survival curves,HR was used to evaluate the difference in survival time between the two groups of individuals,t-test was used to compare continuity variables,and C-index values were used to compare model and gene labeling efficacy.Results Among the 95 models,the GRGS developed by the Minimum Absolute contraction and Selection Operation+Random Survival Forest Model(LASSO+RSF)showed the highest c-index value(C-index),with a mean of 0.674 across five cohorts.In the Cancer Genome Atlas(TCGA)cohort,PDAC patients with higher CRGS scores had worse OS[HR=7.4,95%confidence interval(CI):4.42-12.41,P<0.01].Compared with Grade grade,TNM stage,sex,and age,CRGS was an independent risk factor for poor OS in PADC patients(HR=1.067,95%CI:1.053-1.081,P<0.01).GRGS performed better in assessing outcomes across multiple cohorts(c-index=0.94)compared to 43 published gene tags.SMC4 is one of the nine genes that make up GMGS,and dasatinib binds to the SMC4 protein molecule with a potency of-0.246,which may be another potential option for gemcitabine-resistant patients.Conclusion GRGS can be used as a promising biomarker to evaluate the clinical prognosis and systemic treatment of patients with PC.
作者
王志峰
程诗博
赵传兵
殷涛
Wang Zhifeng;Cheng Shibo;Zhao Chuanbing;Yin Tao(Department of Pancreatic Surgery,Union Hospital,Tongji Medical College,Huazhong University of Science and Technology,Wuhan 430022,China)
出处
《中华实验外科杂志》
CAS
2024年第1期35-39,共5页
Chinese Journal of Experimental Surgery
基金
国家自然科学基金(82173196)
湖北省重点研发计划(2022BCA012)。
关键词
胰腺癌
机器学习
基因特征
吉西他滨
Pancreatic cancer
Machine learning
Gene signature
Gemcitabine