摘要
抗癌药物敏感性数据的缺失会对后续癌症数据分析产生重要影响.高通量测序技术为构建计算模型,有效预测抗癌药物敏感性提供了可能.依据已有的合理性假设:相似的细胞系对于目标药物具有相似的反应;相似的药物对于目标细胞系具有相似的反应,本文综合考虑了细胞系的基因表达和基因突变特征,给出细胞系相似性新的定义形式,结合药物相似性度量方法,提出了"细胞系-药物K近邻"计算模型,并将其成功应用于癌症细胞系百科全书(CCLE),得到的抗癌药物敏感性预测结果明显优于已有的经典模型.
Missing values in the anti-cancer drug sensitivity data might have impact on the downstream cancer data analyses.High-throughput sequencing technology provides us a big chance to build computational models for predicting the sensitivity of anti-cancer drugs.According to a reasonable hypothesis,"similar cell lines have similar response to a given drug,and similar drugs have similar response to a given cell line".Comprehensive consideration of gene expression and gene mutation information of cell lines,a new definition of cell line similarity is given.Combining the drug similarity,a computational model named cell line-drug K Nearest Neighbor is proposed,and then applied it on the Cancer Cell Line Encyclopedia.The prediction result is significantly superior to some popular methods.
作者
王波
魏东
李玉双
WANG Bo;WEI Dong;LI Yu-shuang(School of Science,Yanshan University,Qinhuangdao 066004,China;Hebei Dataport Technology Co.,Ltd,Qinhuangdao 066004,China)
出处
《数学的实践与认识》
北大核心
2020年第4期295-300,共6页
Mathematics in Practice and Theory
基金
国家自然科学基金(61807029).
关键词
抗癌药物敏感性
细胞系相似性
药物相似性
K近邻
anti-cancer drug sensitivity
cell line similarity
drug similarity
K nearest neighbor