摘要
肺癌风险致病基因预测有助于了解疾病发病机制、提高临床治疗效果.目前,以重启游走为框架的风险致病基因预测算法,普遍存在起始节点少、节点转移概率相同、信息源单一的问题.为此,本文提出一种基于扩展起始节点和加权融合策略的风险致病基因预测算法(命名为AFMFSC),并在肺癌中验证算法有效性.首先,基于增广模糊测量思想,计算疾病表型近似基因间的增广功能相似得分,从中选出重要基因与致病基因作为扩展起始节点;其次,采用节点拓扑相似度转移矩阵及基因表达差异相关性转移矩阵,分别在蛋白质网络中重启随机游走,并将两种结果加权融合排序;最后,通过富集分析排名靠前基因,得到有显著意义的风险致病基因.AFMFSC算法预测的73个肺癌风险致病基因,均与肺癌发生、发展有密切联系,生物学意义显著.与其他排序算法相比,AFMFSC算法的Top 1%、Top 5%和AUC值比较大,平均排名和受拓扑特性偏差影响程度小;融合策略排名性能优于单一转移矩阵或普通邻接矩阵游走排名.AFMFSC算法不仅能准确有效地预测肺癌风险致病基因,而且可推广预测其他疾病风险致病基因,为探索癌症致病机理提供新视角及依据.
The identification of risk pathogenic genes for lung cancer is helpful to understand disease pathogenesis and improve clinical practice. However, the present predicting methods of using RWR framework include the common problems of the less initial nodes, the same node transition probability, and the single information source. To further improve the performance of RWR framework, we propose a novel method named AFMFSC to identify disease-related genes, by enlarging the initial nodes and weighted fusion strategy, and use lung cancer as the test object. The AFMFSC algorithm first computes the augmented functional similarity scores between disease phenotype approximate genes based on the idea of augmenting fuzzy measure similarity, screens important genes as the expanded initial nodes together with pathogenic genes, then walks in the global PPI network separately guided by the node similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles, all the genes in the network are ranked by weighted fusing the above results guided by two types of transition matrices, at last the top ranked genes in the enrichment analysis as final risk pathogenic genes are determined. 73 significant genes are predicted to be the risk pathogenic genes for lung cancer, which are closely linked with the generation and development of this disease. Compared with the existing methods for prioritizing potential risk disease genes, the AFMFSC achieves a smaller average rank and less affect by degree distribution bias but bigger Top 1%,Top 5%and AUC value. In addition, the ranking performance of fusion strategy outperforms a single transfer matrix or ordinary adjacency matrix. The AFMFSC algorithm not only can accurately and effectively predict the risk pathogenic genes of lung cancer, but also can be easily extended to identify any other diseases related genes, and provide additional insights for exploring the pathogenesis of cancer.
出处
《生物化学与生物物理进展》
SCIE
CAS
CSCD
北大核心
2016年第2期176-186,共11页
Progress In Biochemistry and Biophysics
基金
国家自然科学基金(91430111,61473232,61170134),国家自然科学基金青年基金(61502396)资助项目
互联网金融创新及监管四川省协同创新中心资助项目
关键词
风险致病基因
扩展起始节点
拓扑相似度转移矩阵
基因表达差异相关性转移矩阵
重启随机游走
risk pathogenic gene
expanded initial node
topological similarity transition matrix
gene expression difference correlational transition matrix
random walk with restart