期刊文献+

Gen-Cluster:一个基因表达数据的高维聚类算法 被引量:2

Gen-Cluster:An Efficient Gene Expression Data High Dimensional Clustering Algorithm
原文传递
导出
摘要 基因表达数据聚类是分析基因之间共调控关系的重要手段.挖掘子空间中表达值存在差异但变化趋势保守的序列已成为基因表达数据聚类的主要研究内容之一.在N-同维趋势相似定义的基础上,提出了一个基因表达数据的高维聚类算法Gen-Cluster,将基因表达值转化为序列形式,采用无重复投影且无候选生成的序列模式挖掘策略自底向上挖掘N-同维趋势模式,并解决了OP-Cluster算法不能挖掘含有项集的序列模式问题,最终得到表达值变化趋势保守的基因序列形成的N-同维趋势簇.实验采用Breast Tumor和MicroRNA表达数据集,验证挖掘结果是有效的,且较OP-Cluster算法表现更高效率,并涵盖其结果. Gene expression data clustering is an important task in gene co-regulated relation analysis. It is one of main research subjects in gene expression data clustering to mine genes which possess conserved tendency while take quite different expression values in subspace. Based on N-seme dimensional tendency similarity definition, a new gene expression data high dimensional dustering algorithm Gen-Cluster is proposed. Gen-Cluster first transforms gene expression value into sequence form, and then finds N-seme dimensional tendency pattern adopting non-duplicate-projection and non-candidategeneration strategy from bottom to upper side. It can deal with sequential patterns containing item set while OP-Cluster algorithm cannot. The experiments on real world data set from Breast Tumor and MicroRNA expression profile are used to evaluate the efficiency and effectiveness of Gen-Ctuster, the results suggest that C-en-Ctuster can generate satisfactory clustering results.
出处 《复旦学报(自然科学版)》 CAS CSCD 北大核心 2008年第2期135-146,共12页 Journal of Fudan University:Natural Science
基金 国家自然科学基金资助项目(60573093) 国家863计划基金资助项目(2006AA02Z329)
关键词 高维数据挖掘 聚类 基因表达数据 N-同维趋势相似 high dimensional data mining clustering gene express data N-same dimensional tendency similarity
  • 相关文献

参考文献17

  • 1Moreau Y, Smet F D, Thus G, et al. Functional bioinformatics of microarray data: from expression to regulation[J]. Proceedings of the IEEE, 2002,90(11) : 1722-1743.
  • 2Mao L Y,Mackenzie C, Roh J H, et al. Combining mlcroarray and genomic data to predict DNA binding motifs [ J ]. Microbiology, 2005,151(10) : 3197-3213.
  • 3Madeira S C,Oliveira A L. Biclustering algorithms for biological data analysis:a survey[J]. IEEE/ACM Trans Comput Biol Bioinform, 2004,1 (1) : 24-45.
  • 4Cheng Y, Church G. Biclustering of expression data[ C]//Bourne P, Gribskov M, Altman R, et al. Proceedings of Eighth International Conference on Intelligent System for Molecular Biology. San Diego:AAAI Press, 2000: 93-103.
  • 5Wang H X,Wang W, Yang J, et al. Clustering by pattem similarity in large data sets[C]//Franklin M J, Moon B,Ailamald A, et al. Proceedings of the 2002 ACM SIGMOD Intemational Conference on Management of Data. Madison, Wisoonsin: ACM, 2002: 394-405.
  • 6Pei J,Zhang X L,Cho M J, et al. Mapel:a fast algorithm for maximal pattern-based clustering[C]//Proceedings of the third IEEE International Conference on Data Mining (ICDM). Melbourne, Florida, USA: IEEE Computer Society,2003: 259-266.
  • 7Ben-Dor A,Chor B,Karp R, et al. Discovering local structure in gene expression data: the order-preserving submatrix problem [ C]//Proceedings of the Sixth Annual International Conference on Computational Biology. Washington DC,USA: ACM, 2002:49-57.
  • 8Liu J Z, Wang W. OP-Cluster: Clustering by tendency in high dimensional space[C]//Proceedings of the third IEEE International Conference on Data Mining (ICDM). Melbourne, Florida, USA: IEEE Computer Society, 2003:187-194.
  • 9Aggarwal C C,Hinneburg A,Keiml D. On the surprising behavior of distance metrics in high dimensional space [C]//Bussche J V, Vianu V. The 8th International Conference on Database Theory. London, UK: Lecture Notes in Computer Science,2001:420-434.
  • 10Agrawal R,Gehrke J. Automatic subspace clustering of high dimensional data for data mining applications[ C]// Haas L M, Tiwary A. Proceeding of the ACM SIGMOD International Conference on Management of Data. Seattle,WA,USA:ACM Press, 1998: 94-105.

二级参考文献18

  • 1R Agrawal,R Srikant.Mining sequential patterns[C].In:Proc of the 11th Int'l Conf on Data Engineering (ICDE95).Los Alamitos,CA:IEEE Computer Society Press,19953-14
  • 2R Srikant,R Agrawal.Mining sequential patterns:Generalization and performance improvements[C].In:Proc of the 5th Int'l Conf on Extending Database Technology (EDBT96).Berlin:Springer-Verlag,19963-17
  • 3M J ZakiSPADE:An efficient al.gorithm for mining frequent sequences[C].Machine Learning(J),2001,42(1-2):31-60
  • 4J Han,J Pei,B Mortazavi-Asl,et al.FreeSpan:Frequent pattern projected sequential pattern mining[C].In:Proc of the 6th Int'l Conf on Knowledge Discovery and Data Mining (KDD2000).New York:ACM Press,200020-23
  • 5J Pei,J Han,B Mortazavi-Asl,et al.PrefixSpan:Mining sequential patterns efficiently by Prefix-projected pattern growth[C].In:Proc of the 12th IEEE Int'l Conf on Data Engineering.Los Alamitos,CA:IEEE Computer Society Press,2001.215-224
  • 6Jian Pei,Jiawei Han,Behzad Mortazavi-Asl,et al.Sequential patterns by pattern-growth:The PrefixSpan approach[J].IEEE Trans on Knowledge and Data Engineering,2004,16(11):1424-1440
  • 7M Y Lin,S Y Lee.Fast discovery of sequential patterns through memory indexing and database partitioning[J].Journal of Information Science and Engineering,2005,21(1):109-128
  • 8J Pei,J Han,B Mortazavi-Asl,et al.Access patterns efficiently from Web logs[C].In:Proc of the 4th Pacific-Asia Conference (PAKDD 2000).Berlin:Springer-Verlag,1996
  • 9J Han,J Pei,Y Yin.Mining frequent patterns without candidate generation[C].In:Proc of the 2000 ACM SIGMOD Int'l Conference.New York:ACM Press,2000
  • 10J Ayres,J Flannick,J Gehrke,et al.Sequential pattern mining using a bitmap representation[C].In:Proc of the 8th Int'l Conf on Knowledge Discovery and Data Mining (KDD2002).New York:ACM Press,2002.429-435

共引文献16

同被引文献31

  • 1唐贤伦,仇国庆,李银国,曹长修.基于粒子群优化和SOM网络的聚类算法研究[J].华中科技大学学报(自然科学版),2007,35(5):31-33. 被引量:9
  • 2Kerr G,Ruskin H J,Crane M.Techniques for clustering gene ex- pression data[J].Computers in Biology and Medicine,2008,38 (3):283-293.
  • 3Xu R, Donald Wunsch II. Survey of clustering algorithms [J]. IEEE Transactions on Neural Networks,2005,16(3):645-678.
  • 4Gupta N,Aggarwal S.MIB:Using mutual information for bi-elus- tering gene expression data[J].Pattern Recognition,2010,43(8): 2692-2697.
  • 5Fan H L.Discrete particle swarm optimization for TSP based on neighborhood [J]. Journal of Computational Information Sys- tems,2010,10(6):3407-3414.
  • 6Shelokar P S,Siarry P, Jayaraman V K,et al.Particle swarm and ant colony algorithms hybridized for improved continuous opti- mization [J]. Applied Mathematics and Computation, 2007,188 (1):129-142.
  • 7Wang Y J,Yang Y P.Particle swarm optimization with preferenceorder ranking for multi-objective optimization [J]. Information Sciences,2009,179(12):1944-1959.
  • 8Liang F, Wang N.Dynamic agglomerative clustering of gene ex- pression profiles [J]. Pattern Recognition Letters, 2007,28 (9): 1062-1076.
  • 9Wang J,Ncskovic P, Coopcr L N.Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence[J].Pat- tem Recognition,2006,39(3):417-423.
  • 10Wong H S,Wang H Q.Construeting the gene regulation-level re- presentation of microarray data for cancer classification[J].Jour- nal of Biomedical Informatics,2008,41 (1):95-105.

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部