BACKGROUND Thalidomide is an effective treatment for refractory Crohn’s disease(CD).However,thalidomide-induced peripheral neuropathy(TiPN),which has a large individual variation,is a major cause of treatment failure...BACKGROUND Thalidomide is an effective treatment for refractory Crohn’s disease(CD).However,thalidomide-induced peripheral neuropathy(TiPN),which has a large individual variation,is a major cause of treatment failure.TiPN is rarely predictable and recognized,especially in CD.It is necessary to develop a risk model to predict TiPN occurrence.AIM To develop and compare a predictive model of TiPN using machine learning based on comprehensive clinical and genetic variables.METHODS A retrospective cohort of 164 CD patients from January 2016 to June 2022 was used to establish the model.The National Cancer Institute Common Toxicity Criteria Sensory Scale(version 4.0)was used to assess TiPN.With 18 clinical features and 150 genetic variables,five predictive models were established and evaluated by the confusion matrix receiver operating characteristic curve(AUROC),area under the precision-recall curve(AUPRC),specificity,sensitivity(recall rate),precision,accuracy,and F1 score.RESULTS The top-ranking five risk variables associated with TiPN were interleukin-12 rs1353248[P=0.0004,odds ratio(OR):8.983,95%confidence interval(CI):2.497-30.90],dose(mg/d,P=0.002),brainderived neurotrophic factor(BDNF)rs2030324(P=0.001,OR:3.164,95%CI:1.561-6.434),BDNF rs6265(P=0.001,OR:3.150,95%CI:1.546-6.073)and BDNF rs11030104(P=0.001,OR:3.091,95%CI:1.525-5.960).In the training set,gradient boosting decision tree(GBDT),extremely random trees(ET),random forest,logistic regression and extreme gradient boosting(XGBoost)obtained AUROC values>0.90 and AUPRC>0.87.Among these models,XGBoost and GBDT obtained the first two highest AUROC(0.90 and 1),AUPRC(0.98 and 1),accuracy(0.96 and 0.98),precision(0.90 and 0.95),F1 score(0.95 and 0.98),specificity(0.94 and 0.97),and sensitivity(1).In the validation set,XGBoost algorithm exhibited the best predictive performance with the highest specificity(0.857),accuracy(0.818),AUPRC(0.86)and AUROC(0.89).ET and GBDT obtained the highest sensitivity(1)and F1 score(0.8).Overall,compared with other state-of-the-art classifiers such as ET,GBDT and RF,XGBoost algorithm not only showed a more stable performance,but also yielded higher ROC-AUC and PRC-AUC scores,demonstrating its high accuracy in prediction of TiPN occurrence.CONCLUSION The powerful XGBoost algorithm accurately predicts TiPN using 18 clinical features and 14 genetic variables.With the ability to identify high-risk patients using single nucleotide polymorphisms,it offers a feasible option for improving thalidomide efficacy in CD patients.展开更多
基金National Natural Science Foundation of China,No.81973398,No.81730103,No.81573507 and No.82020108031The National Key Research and Development Program,No.2017YFC0909300 and No.2016YFC0905001+5 种基金Guangdong Provincial Key Laboratory of Construction Foundation,No.2017B030314030 and No.2020B1212060034Science and Technology Program of Guangzhou,No.201607020031National Engineering and Technology Research Center for New Drug Druggability Evaluation(Seed Program of Guangdong Province),No.2017B090903004The 111 Project,No.B16047China Postdoctoral Science Foundation,No.2019M66324,No.2020M683140 and No.2020M683139Natural Science Foundation of Guangdong Province,No.2022A1515012549 and No.2023A1515012667.
文摘BACKGROUND Thalidomide is an effective treatment for refractory Crohn’s disease(CD).However,thalidomide-induced peripheral neuropathy(TiPN),which has a large individual variation,is a major cause of treatment failure.TiPN is rarely predictable and recognized,especially in CD.It is necessary to develop a risk model to predict TiPN occurrence.AIM To develop and compare a predictive model of TiPN using machine learning based on comprehensive clinical and genetic variables.METHODS A retrospective cohort of 164 CD patients from January 2016 to June 2022 was used to establish the model.The National Cancer Institute Common Toxicity Criteria Sensory Scale(version 4.0)was used to assess TiPN.With 18 clinical features and 150 genetic variables,five predictive models were established and evaluated by the confusion matrix receiver operating characteristic curve(AUROC),area under the precision-recall curve(AUPRC),specificity,sensitivity(recall rate),precision,accuracy,and F1 score.RESULTS The top-ranking five risk variables associated with TiPN were interleukin-12 rs1353248[P=0.0004,odds ratio(OR):8.983,95%confidence interval(CI):2.497-30.90],dose(mg/d,P=0.002),brainderived neurotrophic factor(BDNF)rs2030324(P=0.001,OR:3.164,95%CI:1.561-6.434),BDNF rs6265(P=0.001,OR:3.150,95%CI:1.546-6.073)and BDNF rs11030104(P=0.001,OR:3.091,95%CI:1.525-5.960).In the training set,gradient boosting decision tree(GBDT),extremely random trees(ET),random forest,logistic regression and extreme gradient boosting(XGBoost)obtained AUROC values>0.90 and AUPRC>0.87.Among these models,XGBoost and GBDT obtained the first two highest AUROC(0.90 and 1),AUPRC(0.98 and 1),accuracy(0.96 and 0.98),precision(0.90 and 0.95),F1 score(0.95 and 0.98),specificity(0.94 and 0.97),and sensitivity(1).In the validation set,XGBoost algorithm exhibited the best predictive performance with the highest specificity(0.857),accuracy(0.818),AUPRC(0.86)and AUROC(0.89).ET and GBDT obtained the highest sensitivity(1)and F1 score(0.8).Overall,compared with other state-of-the-art classifiers such as ET,GBDT and RF,XGBoost algorithm not only showed a more stable performance,but also yielded higher ROC-AUC and PRC-AUC scores,demonstrating its high accuracy in prediction of TiPN occurrence.CONCLUSION The powerful XGBoost algorithm accurately predicts TiPN using 18 clinical features and 14 genetic variables.With the ability to identify high-risk patients using single nucleotide polymorphisms,it offers a feasible option for improving thalidomide efficacy in CD patients.