摘要
不同细胞在特定化合物作用下具有不同的扰动信号,基于这些扰动信号预测细胞的活性和挖掘隐藏在表型之下的药物敏感性非常重要。文中开发了一种基于LINCS-L1000扰动信号的SAE-XGBoost细胞活性预测算法。通过对LINCS-L1000、Achilles和CTRP三大数据集匹配和筛选,采用堆栈式深度自动编码器对基因信息进行特征提取,结合RW-XGBoost算法预测药物诱导下的细胞活性,进而在NCI60和CCLE数据集上完成药物敏感性推断。与其他方法相比,该模型取得了良好效果,皮尔逊相关系数为0.85,并进行独立集验证,对应皮尔逊相关系数为0.68。结果表明,所提出的方法有助于发现新型有效的抗癌药物,为精准医疗提供帮助。
Different cell lines have different perturbation signals in response to specific compounds, and it is important to predict cell viability based on these perturbation signals and to uncover the drug sensitivity hidden underneath the phenotype. We developed an SAE-XGBoost cell viability prediction algorithm based on the LINCS-L1000 perturbation signal. By matching and screening three major dataset, LINCS-L1000, CTRP and Achilles, a stacked autoencoder deep neural network was used to extract the gene information. These information were combined with the RW-XGBoost algorithm to predict the cell viability under drug induction, and then to complete drug sensitivity inference on the NCI60 and CCLE datasets. The model achieved good results compared to other methods with a Pearson correlation coefficient of 0.85. It was further validated on an independent dataset, corresponding to a Pearson correlation coefficient of 0.68. The results indicate that the proposed method can help discover novel and effective anti-cancer drugs for precision medicine.
作者
陆家兴
陈明
秦玉芳
于晓庆
Jiaxing Lu;Ming Chen;Yufang Qin;Xiaoqing Yu(College of Information Technology,Shanghai Ocean University,Shanghai 201306,China;School of Sciences,Shanghai Institute of Technology,Shanghai 201418,China)
出处
《生物工程学报》
CAS
CSCD
北大核心
2021年第4期1346-1359,共14页
Chinese Journal of Biotechnology
基金
上海市科技创新计划(No.20dz1203800)
国家自然科学基金(Nos.61702325,11701379)
国家重点研发计划(No.2018YFD0701003)
上海市科技创新行动计划(No.16391902900)资助。