摘要
各种研究结果不断证明,人类各种常见疾病都属于复杂疾病,是由多基因、多因素、遗传和环境共同作用的结果。借助于高通量生物技术的飞速发展,生物学家建立起了蛋白交互网络,如果借助复杂网络研究的方法,从这些网络中找出与疾病相关的蛋白质子网络,将有助于我们更深入地了解生物体的运作机制。本文提出了一种基于贪婪算法的搜索方法,能够自动地搜索整个网络中的子网或模块,并且能够结合芯片数据同时进行T检验来判断子网络对疾病表型的区分能力。通过计算子网的P值,给出该蛋白质子网络的统计显著性值并进行区分能力排序。运行结果表明,本方法不但能够用于发现已知的疾病蛋白,而且能够对未知的蛋白进行预测,结合生物芯片技术,将会对疾病基因的研究提供有价值的信息。
Previous studies continuously prove that many common human diseases are complex diseases which caused by multiple genes, multiple factors and the combination of the heredity and the environment. Benefited from the rapid development of high throughput biotechnology, biologists established protein interaction networks. It would he helpful for us to understand the mechanism of life if we could find interacting protein subnetworks in these networks which related to diseases using complex network analysis. In this paper, we proposed a search strategy based on greedy algorithm. This strategy is able to automatically search for subnetworks or modules in the whole network, and evaluate the subnetworks' ability of discriminating diseases' phenotypes by performing student t tests. After computing the p values of subnetworks, it sorts the subnetworks according to their discriminating abilities revealed by t scores. Our results show that the proposed method is not only capable of finding known disease proteins, but also predicting proteins with unknown functions. When combined with biochip technology, this method will provide valuable insights for genetic research of diseases.
出处
《微计算机信息》
2010年第6期188-189,217,共3页
Control & Automation
关键词
蛋白交互网络
贪婪算法
蛋白质子网
C++
Protein-Protein interaction network
greedy algorithm
protein subnetwork
C++