期刊文献+

A Novel Approach to Revealing Positive and Negative Co-Regulated Genes 被引量:2

A Novel Approach to Revealing Positive and Negative Co-Regulated Genes
原文传递
导出
摘要 As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches. As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第2期261-272,共12页 计算机科学技术学报(英文版)
基金 This work is supported by the National Grand Fundamental Research 973 Program of China (Grant No. 2006CB303103) and the National Natural Science Foundation of China under Grants No. 60573089, No. 60273079 and No. 60473074.
关键词 microarray data pattern-based clustering co-regulated genes microarray data, pattern-based clustering, co-regulated genes
  • 相关文献

参考文献18

  • 1Liu J, Wang W. Op-cluster: Clustering by tendency in high dimensional space. In Proc. ICDM 2003 Conference, Melbourne, USA, 2003, 187-194.
  • 2Haixun Wang, Wei Wang, Jiong Yang, Philip S Yu. Clustering by pattern similarity in large data sets. In Proc. the 2002 A CM SIGMOD Conference, Wisconsin, 2002, pp.394-405.
  • 3Jian Pei, Xiaoling Zhang, Moonjung Cho et al. Maple: Af ast algorithm for maximal pattern-based clustering. In Proc.IGDM 2003 Gonf., Florida, 2003, pp.259-266.
  • 4Haixun Wang, Fang Chu, Wei Fan, Philip S Yu, Jian Pei. A fast algorithm for subspace clustering by pattern similarity. In Proc. Scientific and Statistical Database Management Conference, Santorinl Island, Greece, 2004, pp.51-62.
  • 5Lizhuang Zhao, Mohammed J Zaki. Tricluster: An effective algorithm for mining coherent clusters in 3d microarray data.In Proc. SIGMOD 2005 Conference, Maryland, USA, 2005,pp.51-62.
  • 6Jinze Liu, Jiong Yang, Wei Wang. Biclustering in gene expression data by tendency. In Proc. 3rd Int. IEEE Computer Society Computational Systems Bioinformatics Conf., Stanford, USA, 2004, pp.182-193.
  • 7Selnur Erdal, Ozgur Ozturk, David L Armbruster et al. A time series analysis of mlcroarray data. In Proc. 4th IEEE Int. Symp. Bioinformatics and Bioengineering Conference,Taichung, 2004, pp.366-378.
  • 8Daxin Jiang, chun Tang, Aidong Zhang. Cluster analysis for gene expression data: A survey. IEEE Trans. Knowl. Data Eng., 2004, 16(11): 1370-1386.
  • 9Jason Ernst, Gerard J Nau, Ziv Bar-Joseph. Clustering short time series gene expression data. Bioinformatics, 2005,21(Suppl): 159-168.
  • 10Yizong Cheng, George M Church. Biclustering of expression data. In Proc. 8th Int. Conf. InteUigent Systems for Molecular Biology 2000 Conference, San Diego, USA, 2000, pp.93-103.

同被引文献10

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部