近年来,个性化医疗引起研究者们的广泛关注,抗癌药物敏感性预测便是个性化医疗的一个主要挑战。本文将CCLE作为抗癌药物敏感性研究的数据集,选取了不同细胞系上的基因表达数据以及药物敏感性数据。同时我们设计了一种名为PCA Transforme...近年来,个性化医疗引起研究者们的广泛关注,抗癌药物敏感性预测便是个性化医疗的一个主要挑战。本文将CCLE作为抗癌药物敏感性研究的数据集,选取了不同细胞系上的基因表达数据以及药物敏感性数据。同时我们设计了一种名为PCA Transformer (PCAT)的混合深度学习与机器学习的方法来对抗癌药物敏感性进行预测。首先构造一个PCA模型来提取在不同细胞系上的基因表达数据中的重要变量,使得约5万的基因维度降至500;随后基于降维后的基因表达值建立了一个神经网络Transformer模型来预测药物敏感性,通过均方根误差(RMSE)来评估我们模型的性能,以结果最优的潜变量数量建立的模型作为最终模型。为了验证PCA Transformer的性能,本文将Transformer模型与预测模型随机森林(RF)和支持向量回归(SVR)来进行对比,为了排除降维方法的影响,统一使用PCA进行降维。具体组合包括:PCA Transformer、PCA + SVR、PCA + RF。最后与前人研究方法(ISIRS)的结果进行比较并优化。最终的预测结果看出,对于CCLE中的24种药物,本方法预测得到的平均RMSE为0.7564,有6种药物的RMSE小于0.5 (L-685458、PF2341066等),有18种药物的RMSE小于1。与其比较的预测方法的平均RMSE分别为:0.8284 (PCA + SVR)、0.8757 (PCA + RF)、ISIRS (0.9258),体现出本方法有着更强的泛化能力。In recent years, personalized medicine has attracted extensive attention from researchers, and the prediction of anticancer drug susceptibility is a major challenge for personalized medicine. In this paper, CCLE was used as a dataset for anticancer drug susceptibility studies, and gene expression data and drug sensitivity data on different cell lines were selected. At the same time, we designed a hybrid deep learning and machine learning method called PCA Transformer (PCAT) to predict the susceptibility of anticancer drugs. Firstly, a PCA model was constructed to extract important variables in gene expression data on different cell lines, so that the gene dimension of about 50,000 was reduced to 500. Then, a neural network Transformer model was established based on the dimensionality reduction gene expression value to predict drug sensitivity, the performance of our model was evaluated by root mean square error (RMSE), and the model established with the optimal number of latent variables was used as the final model. In order to verify the performance of PCA Transformer, this paper compares the Transformer model with the prediction model random forest (RF) and support vector regression (SVR). Specific combinations include: PCA Transformer, PCA + SVR, PCA + RF. Finally, the results were compared and optimized with the results of previous research methods (ISIRS). The final prediction results showed that for the 24 drugs in CCLE, the average RMSE predicted by this method was 0.7564, 6 drugs had RMSE less than 0.5 (L-685458, PF2341066, etc.), and 18 drugs had RMSE less than 1. The average RMSE of the prediction method is 0.8284 (PCA + SVR), 0.8757 (PCA + RF) and ISIRS (0.9258), respectively, indicating that the proposed method has stronger generalization ability.展开更多
文摘近年来,个性化医疗引起研究者们的广泛关注,抗癌药物敏感性预测便是个性化医疗的一个主要挑战。本文将CCLE作为抗癌药物敏感性研究的数据集,选取了不同细胞系上的基因表达数据以及药物敏感性数据。同时我们设计了一种名为PCA Transformer (PCAT)的混合深度学习与机器学习的方法来对抗癌药物敏感性进行预测。首先构造一个PCA模型来提取在不同细胞系上的基因表达数据中的重要变量,使得约5万的基因维度降至500;随后基于降维后的基因表达值建立了一个神经网络Transformer模型来预测药物敏感性,通过均方根误差(RMSE)来评估我们模型的性能,以结果最优的潜变量数量建立的模型作为最终模型。为了验证PCA Transformer的性能,本文将Transformer模型与预测模型随机森林(RF)和支持向量回归(SVR)来进行对比,为了排除降维方法的影响,统一使用PCA进行降维。具体组合包括:PCA Transformer、PCA + SVR、PCA + RF。最后与前人研究方法(ISIRS)的结果进行比较并优化。最终的预测结果看出,对于CCLE中的24种药物,本方法预测得到的平均RMSE为0.7564,有6种药物的RMSE小于0.5 (L-685458、PF2341066等),有18种药物的RMSE小于1。与其比较的预测方法的平均RMSE分别为:0.8284 (PCA + SVR)、0.8757 (PCA + RF)、ISIRS (0.9258),体现出本方法有着更强的泛化能力。In recent years, personalized medicine has attracted extensive attention from researchers, and the prediction of anticancer drug susceptibility is a major challenge for personalized medicine. In this paper, CCLE was used as a dataset for anticancer drug susceptibility studies, and gene expression data and drug sensitivity data on different cell lines were selected. At the same time, we designed a hybrid deep learning and machine learning method called PCA Transformer (PCAT) to predict the susceptibility of anticancer drugs. Firstly, a PCA model was constructed to extract important variables in gene expression data on different cell lines, so that the gene dimension of about 50,000 was reduced to 500. Then, a neural network Transformer model was established based on the dimensionality reduction gene expression value to predict drug sensitivity, the performance of our model was evaluated by root mean square error (RMSE), and the model established with the optimal number of latent variables was used as the final model. In order to verify the performance of PCA Transformer, this paper compares the Transformer model with the prediction model random forest (RF) and support vector regression (SVR). Specific combinations include: PCA Transformer, PCA + SVR, PCA + RF. Finally, the results were compared and optimized with the results of previous research methods (ISIRS). The final prediction results showed that for the 24 drugs in CCLE, the average RMSE predicted by this method was 0.7564, 6 drugs had RMSE less than 0.5 (L-685458, PF2341066, etc.), and 18 drugs had RMSE less than 1. The average RMSE of the prediction method is 0.8284 (PCA + SVR), 0.8757 (PCA + RF) and ISIRS (0.9258), respectively, indicating that the proposed method has stronger generalization ability.