摘要
通过整合体细胞突变、拷贝数变异和基因表达等3种组学数据,提出识别癌症驱动通路的改进最大权重子矩阵模型。该模型用通路中基因平均权重调控覆盖度和互斥度,对权重大的基因集覆盖度进行加强,同时放松其高互斥度约束。引入基于贪心算法的重组算子,提出求解该模型的单亲遗传算法PGA-MWS。采用胶质母细胞瘤和卵巢癌数据集对算法PGA-MWS和GA进行实验对比分析。实验结果显示,较GA方法,基于改进模型的PGA-MWS算法能识别出覆盖度高但互斥度不太高的基因集,且其识别的基因集中,许多均参与已知信号通路,并被证实与癌细胞密切相关,同时还能识别几种潜在的候选驱动通路,因此PGA-MWS方法可作为检测癌症驱动通路的一种有效补充。
This paper proposed improved maximum weight submatrix problem model for identifying driver pathways in cancer by integrating somatic mutations,copy number variations,and gene expressions.The model tries to adjust cove- rage and mutual exclusion with the average weight of genes in a pathway,enhances the coverage of the gene set with large weight and relaxes its mutual exclusion constraint.By introducing a greedy based recombination operator,a parthenogenetic algorithm PGA-MWS was presented to solve the model.Experimental comparisons between PGA-MWS and GA were performed on glioblastoma and ovarian cancer datasets.Experimental results show that,compared with GA algorithm,PGA-MWS algorithm based on the improved model can identify gene sets with high coverage and less mutual exclusion.Many of the identified gene sets are involved in known signaling pathways,and have been confirmed to be closely related to cancer cells.Simultaneously,several potential drive pathways can also be discovered.Therefore,the proposed approach may become a useful complementary one for identifying driver pathways.
作者
蔡齐荣
吴璟莉
CAI Qi-rong;WU Jing-li(College of Computer Science and Information Technology,Guangxi Normal University,Guilin,Guangxi 541004,China;Guangxi Key Lab of Multi-source Information Mining & Security,Guangxi Normal University,Guilin,Guangxi 541004,China)
出处
《计算机科学》
CSCD
北大核心
2019年第9期310-314,共5页
Computer Science
基金
国家自然科学基金项目(61762015,61502111,61662007,61763003)
广西自然科学基金项目(2015GXNSFAA139288,2016GXNSFAA380192)
广西研究生教育创新计划项目(XYCSZ2018078)
“八桂学者”工程专项,广西多源信息挖掘与安全重点实验室系统性研究基金项目(14-A-03-02,15-A-03-02)
广西科技基地和人才专项(AD16380008)资助
关键词
驱动通路
多组学数据
癌症
算法
模型
Driver pathway
Multi-omics data
Cancer
Algorithm
Model