Patient-derived tumor xenografts(PDXs)are a powerful tool for drug discovery and screening in cancer.However,current studies have led to little understanding of genotype mismatches in PDXs,leading to massive economic ...Patient-derived tumor xenografts(PDXs)are a powerful tool for drug discovery and screening in cancer.However,current studies have led to little understanding of genotype mismatches in PDXs,leading to massive economic losses.Here,we established PDX models from 53 lung cancer patients with a genotype matching rate of 79.2%(42/53).Furthermore,17 clinicopathological features were examined and input in stepwise logistic regression(LR)models based on the lowest Akaike information criterion(AIC),least absolute shrinkage and selection operator(LASSO)-LR,support vector machine(SVM)recursive feature elimination(SVM-RFE),extreme gradient boosting(XGBoost),gradient boosting and categorical features(Cat Boost),and the synthetic minority oversampling technique(SMOTE).Finally,the performance of all models was evaluated by the accuracy,area under the receiver operating characteristic curve(AUC),and F1 score in 100 testing groups.Two multivariable LR models revealed that age,number of driver gene mutations,epidermal growth factor receptor(EGFR)gene mutations,type of prior chemotherapy,prior tyrosine kinase inhibitor(TKI)therapy,and the source of the sample were powerful predictors.Moreover,Cat Boost(mean accuracy=0.960;mean AUC=0.939;mean F1 score=0.908)and the eight-feature SVM-RFE(mean accuracy=0.950;mean AUC=0.934;mean F1 score=0.903)showed the best performance among the algorithms.Meanwhile,application of the SMOTE improved the predictive capability of most models,except Cat Boost.Based on the SMOTE,the ensemble classifier of single models achieved the highest accuracy(mean=0.975),AUC(mean=0.949),and F1 score(mean=0.938).In conclusion,we established an optimal predictive model to screen lung cancer patients for non-obese diabetic(NOD)/Shi-scid,interleukin-2 receptor(IL-2R)γ^(null)(NOG)/PDX models and offer a general approach for building predictive models.展开更多
基金supported in part by a grant of National Natural Science Foundation of China(81802255)Clinical Research Project of Shanghai Pulmonary Hospital(FKLY20010)+10 种基金Young Talents in Shanghai(2019 QNBJ)"Dream Tutor"Outstanding Young Talents Program(fkyq1901)Clinical Research Project of Shanghai Pulmonary Hospital(FKLY20001)Respiratory Medicine,a key clinical specialty construction project in Shanghai,promotion and application of multidisciplinary collaboration system for pulmonary non infectious diseasesClinical Research Project of Shanghai Pulmonary Hospital(fk18005)Key Discipline in 2019(Oncology)Project of Shanghai Municipal Health Commission(201940192)Scientific Research Project of Shanghai Pulmonary Hospital(fkcx1903)Shanghai Municipal Commission of Health and Family Planning(2017YQ050)Innovation Training Project of SITP of Tongji University,Key Projects of Leading Talent(19411950300)Youth project of hospital management research fund of Shanghai Hospital Association(Q1902037)。
文摘Patient-derived tumor xenografts(PDXs)are a powerful tool for drug discovery and screening in cancer.However,current studies have led to little understanding of genotype mismatches in PDXs,leading to massive economic losses.Here,we established PDX models from 53 lung cancer patients with a genotype matching rate of 79.2%(42/53).Furthermore,17 clinicopathological features were examined and input in stepwise logistic regression(LR)models based on the lowest Akaike information criterion(AIC),least absolute shrinkage and selection operator(LASSO)-LR,support vector machine(SVM)recursive feature elimination(SVM-RFE),extreme gradient boosting(XGBoost),gradient boosting and categorical features(Cat Boost),and the synthetic minority oversampling technique(SMOTE).Finally,the performance of all models was evaluated by the accuracy,area under the receiver operating characteristic curve(AUC),and F1 score in 100 testing groups.Two multivariable LR models revealed that age,number of driver gene mutations,epidermal growth factor receptor(EGFR)gene mutations,type of prior chemotherapy,prior tyrosine kinase inhibitor(TKI)therapy,and the source of the sample were powerful predictors.Moreover,Cat Boost(mean accuracy=0.960;mean AUC=0.939;mean F1 score=0.908)and the eight-feature SVM-RFE(mean accuracy=0.950;mean AUC=0.934;mean F1 score=0.903)showed the best performance among the algorithms.Meanwhile,application of the SMOTE improved the predictive capability of most models,except Cat Boost.Based on the SMOTE,the ensemble classifier of single models achieved the highest accuracy(mean=0.975),AUC(mean=0.949),and F1 score(mean=0.938).In conclusion,we established an optimal predictive model to screen lung cancer patients for non-obese diabetic(NOD)/Shi-scid,interleukin-2 receptor(IL-2R)γ^(null)(NOG)/PDX models and offer a general approach for building predictive models.