摘要
前列腺癌病因及发病机理研究有助于前列腺癌预防和治疗.目前,前列腺癌生化试验研究方法成本高、耗时,而基于网络计算方法容易受基因表达谱数据不完整、噪声高及实验样本数量少等约束.为此,本文提出一种基于节点-模块置信度及局部模块度的双重约束算法(命名为NMCOM),挖掘前列腺癌候选疾病模块.NMCOM算法不依赖基因表达谱数据,采用候选基因与致病表型之间一致性得分,候选基因与致病基因之间语义相似性得分融合排序策略,选取起始节点,并基于节点-模块置信度及局部模块度双重约束挖掘前列腺癌候选疾病模块.通过对挖掘出的模块进行富集分析,最终得到18个有显著意义的候选疾病基因模块.与单一打分排序方法及随机游走重开始方法相比,NMCOM融合排序策略的平均排名比小、AUC值大,且挖掘出结果明显优于其他模块挖掘算法,模块生物学意义显著.NMCOM算法不仅能准确有效地挖掘前列腺癌候选疾病模块,且可扩展挖掘其他疾病候选模块.
Researches on the etiology and pathogenesis of prostate cancer are helpful for disease diagnosis and treatment. However, current biochemical experimental methods for prostate cancer are both costly and time-consuming, as well as networks based methods for this disease analysis limited by the nature of gene expression profiles for its incomplete, high noise and small sample size. Therefore, we proposed a dual constraint algorithm based on the confidence of one vertices belonging to the community and local modularity, named as NMCOM, to mine the candidate disease modules of prostate cancer in the present work. The NMCOM algorithm is gene expression independent method. It first integrated the concordance scores between the candidate genes and the causative phenotypes, as well as the semantic similarity scores between the candidate genes and the causative genes for prioritizing the candidate genes together, and then the starting node is selected with a sorting strategy. Finally, the candidate modules of prostate cancer are mined with dual constraint produces constructing on the confidence between node and module, as well as local modularity. 18 significant candidate disease gene modules were detected for the enrichment analysis of the obtained modules. Compared with the single scoring sorting methods and random walk with restart, the NMCOM fusion prioritizing strategy achieved a smaller MRR (Mean Rank Ratio) but bigger A UC value. The results are significantly better than other modules-based mining algorithms, and the biological explanations for these mined modules are more significant. More importantly, the NMCOM algorithm can be easily extended to mine any other diseases candidate modules.
出处
《生物化学与生物物理进展》
SCIE
CAS
CSCD
北大核心
2015年第4期375-389,共15页
Progress In Biochemistry and Biophysics
基金
国家自然科学基金资助项目(91430111,61473232,61170134)
西南财经大学“中央高校基本科研业务费专项资金-青年教师成长项目”(JBK150134)~~
关键词
前列腺癌
疾病模块挖掘
候选基因排序
节点-模块置信度
局部模块度
prostate cancer, disease module mining, candidate gene prioritization, node-module confidence, local modularity