摘要
肿瘤药物敏感性预测对个性化精准用药具有重要意义。本文基于GDSC数据库通过Boosting集成学习构建了面向RNA-seq基因表达和癌症药物敏感性数据的预测模型。先将183种药物集分别做归一化处理和基因特征降维,接着用AdaBoost集成SVM的方法建模,并采用十折交叉验证。实验结果表明构建的预测模型具有较高的预测精度,13种药物的AUC大于0.95,108种大于0.9,174种大于0.8。对比验证实验中,AdaBoost+SVM相比单学习器模型在整体药物集的综合评价指标中约提高4%,与其他集成模型相比提高2%。同时本文探讨了药物特异性,通过特征选择和富集分析对药物作用通路进行验证,从生物学角度提供了模型可解释性,证明其应用于临床用药指导的价值。
The prediction of anti-tumor drug sensitivity is of great significance for personalized and precise medication.Herein a prediction model for RNA-seq gene expression and anti-cancer drug sensitivity data is established based on GDSC database through Boosting ensemble learning.A total of 183 drug sets are normalized,and gene feature dimensionality is reduced.Then,AdaBoost+SVM is used for modeling,and 10-fold cross validation for verifying.The experimental results show that the established prediction model has a high prediction accuracy.The AUC of 13,108 and 174 drugs are greater than 0.95,0.90 and 0.80,respectively.AdaBoost+SVM improves the comprehensive evaluation index of the overall drug set by about 4%and 2%,compared with the models based on a learner only and other ensemble models.Meanwhile,drug specificity is also discussed;and the signal pathway of specific drug is verified through feature selection and enrichment analysis;and the interpretability of the established model is confirmed from a biological perspective.In sum,the value of the established model in clinical medication guidance is proved in the study.
作者
黄鹏杰
林勇
张梦欢
吕琳
刘振浩
裴潇倜
许林锋
谢鹭
HUANG Pengjie;LIN Yong;ZHANG Menghuan;LÜLin;LIU Zhenhao;PEI Xiaoti;XU Linfeng;XIE Lu(School of Medical Instrument and Food Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Shanghai Center for Bioinformation Technology,Shanghai 201203,China;Center for Excellence in Molecular Cell Science,Chinese Academy of Sciences,Shanghai 200031,China)
出处
《中国医学物理学杂志》
CSCD
2021年第4期511-517,共7页
Chinese Journal of Medical Physics
基金
国家自然科学基金青年科学基金(31800700)
国家自然科学基金(31301092)
上海市卫计委协同创新集群项目(2019CXJQ02)。