期刊文献+

一种基于“基因表达谱”的并行聚类算法 被引量:11

A Parallel Clustering Algorithm of Gene Expression Patterns
下载PDF
导出
摘要 跨物种的生物序列比较已经被广泛应用于基因功能预测,而越来越多的实验表明序列相似性并不足以保证基因功能相似.为了精确确定基因功能,不仅需要考虑序列性质,还需探索基因表达信息的特性,因为基因表达的改变往往伴随着基因功能的改变.通过聚类分析基因表达谱,可以直观判断协同表达基因及其规律,这是考察基因功能的重要一步.由于生物组织基因表达的复杂性,以及识别表达的microarray技术和理念的不断更新,表达数据的规模也呈指数规律递增,聚类分析遭遇了巨大瓶颈——过高的时空复杂度.根据“基因表达谱”的数据特征,对处理表达谱数据的分层聚类提出了一种并行分层聚类算法——PHCA,主要解决了并行设计的负载平衡问题,并实现了MPI平台的并行程序设计.并行程序性能分析表明,PHCA算法较大幅度降低了分层聚类算法的时空复杂度. Cross-species sequence comparison has been widely used to infer gene function, however, an increasing number of genetic studies apparently indicate that sequence similarity is not always proportional to gene functional similarity. In order to determine the function of a gene precisely, we need to investigate not only its sequence characteristics but also its expression information, since changes in gene expression may often be associated with changes in gene function. It is believed that clusters of gene expression patterns help to identify co-expressed genes and its regulations. Due to the complexity of gene expression as well as the updating microarray technology, the multi-dimensional dataset of gene expression patterns shows exponential increase and the performances of clustering algorithms are very critical. This paper proposes a Parallel Hierarchical Clustering Algorithm (PHCA) based on hierarchical clustering method and implements it via MPI. The algorithm focuses on solving the problem of load balance. The parallel performance analysis indicates that PHCA decreases the complexities of time and memory to a great extent.
出处 《计算机学报》 EI CSCD 北大核心 2007年第2期311-316,共6页 Chinese Journal of Computers
基金 国家自然科学基金(60533020 60673064) 国家科学技术部"天文 生物信息和计算化学网格计算应用系统建设"项目基金(2005DKA64002)资助~~
关键词 聚类分析 基因表达谱 分层聚类 负载平衡 clustering analysis gene expression patterns hierarchical clustering load balance
  • 相关文献

参考文献5

  • 1Zhou X H,Gibson G.Cross-species comparison of genomewide expression patterns.Genome Biology,2004,5(7):232-236
  • 2Wagner A.Decoupled evolution of coding region and mRNA expression patterns after gene duplication:Implications for the neutralist-selectionist debate.Proc Natl Acad Sci USA,2000,97(12):6579-6584
  • 3Eisen M B,Spellman P T,Brown P O,Botstein D.Cluster analysis and display of genome-wide expression patterns.Proc Natl Acad Sci USA,1998,95(25):14863-14868
  • 4Seal S,Komarina S,Aluru S.An optimal hierarchical clustering algorithm for gene expression data.Information Processing Letters,2005,93(3):143-147
  • 5Sokal R R,Michener C D.A statistical method for evaluating systematic relationships.Univ Kans Sci Bull,1958,38:1409-1438

同被引文献97

引证文献11

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部