摘要
药物不良反应是疾病治疗过程中一个非常重要的考虑因素,为了指导医生有效避免或减少药物不良反应,提出一种基于多数据源与机器学习的药物副作用预测方法,将患病前后和用药前后基因的表达量变化情况作为解释变量,利用随机森林算法对药物副作用进行预测,发现使用随机森林算法针对5种副作用的测试结果都优于传统的K近邻算法,其中副作用皮疹的测试准确率达到90.24%,相比K近邻算法提升了31.70%。结果表明,利用基因表达量变化情况这一特征可以很好地预测药物副作用,同时也说明副作用的发生和基因表达量变化情况具有很强的相关性。
Adverse drug reactions(ADRs)is a very important factor in the treatment of diseases.In order to guide doctors to effectively avoid or reduce the occurrence of ADRs,we propose a drug side effect prediction method based on multiple data sources and machine learning.We use random forest algorithm to predict drug side effects by taking the changes in gene expression before and after disease and medication as explanatory variables.We found that the random forest algorithm was superior to the previous KNN algorithm in the tests of five side effects,among which the accuracy of side effect rash testreached 90.24%,which was 31.70%higher than the KNN algorithm.The results of the model show that the variation of gene expression can be used to predict the side effects of drugs,and that the occurrence of side effects has a strong correlation with the variation of gene expression.
作者
杜瑶
DU Yao(Business School,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《软件导刊》
2021年第5期39-43,共5页
Software Guide
关键词
副作用预测
基因表达
机器学习
多数据源
side effect prediction
gene expression
machine learning
multiple data sources