期刊文献+

CIS:一种基于迭代扩张的微阵列数据聚类算法

CIS:An Iterative Spread-based Algorithm for Clustering Micro-array Data
下载PDF
导出
摘要 DNA微阵列技术使同时监测成千上万的基因表达水平成为可能。直接把传统聚类算法用于高维基因表达数据分析会受到"维难"的困扰。特征转换和特征选择是两种常用的降维方式,但前者产生的新特征难以用原来的领域知识解释,后者通常会丢失信息。另外,传统的聚类算法通常由用户指定聚类参数,参数设置不同对聚类结果有很大的影响。针对上述问题,本文提出了一种新的基于迭代扩张的微阵列数据聚类算法-CIS。它不采用特征转换和特征选择的方式,并自动确定聚类参数。CIS反复用最新得到的样本聚簇得到新的聚类基因,然后以新的基因聚簇为特征重新聚类样本,逐步求精,最终的结果容易解释且避免了信息的丢失。该方法降低了由于用户缺少领域知识引起的实验误差。CIS算法被应用于两个真实的微阵列数据集,实验结果证实了算法的有效性。 DNA Micro-array technique makes it possible to simultaneously monitor the expression levels of tens of thousands of genes. The traditional clustering methods will suffer from the curse of dimensionality when directly applied to Micro-array data. The two common dimensionality reduction methods, i.e. feature transformation and feature selection, are unsuitable for the analysis of Micro-array data, since the former generates the new features difficult to interpret and the latter misses some information. Besides, most traditional clustering algorithms need the user-specific parameters, which may result in quite different results. In this paper, we present an iterative spread-based algorithm, namely CIS, for clustering Micro-array data, which selects threshold automatically. Instead of feature selection and feature transformation, in a progressively refining manner, CIS repeatedly partitions the genes with the new-generated sample clusters as features, and then partitions the samples with the new-generated gene clusters as features. The algorithm is applied to two real gene Micro-array data sets. Experiment results confirm its effectiveness and efficiency.
作者 王晓明 印莹
出处 《计算机科学》 CSCD 北大核心 2007年第8期171-176,共6页 Computer Science
关键词 微阵列 聚类 降维 Micro-array,Clustering,Dimensionality reduction
  • 相关文献

参考文献11

  • 1Brazma A,Vilo J.Minireview:Gene expression data analysis.Federation of European Biochemical societies,June 2000,480:17-24
  • 2D'haeseleer P,Liang S,Somogyi R.Genetic network inference:from co-expression clustering to reverse engineering.Bioinformatics,2000,16(8):707-726
  • 3Sharan R,Elkon R,Shamir R.Clustering Analysis and its Application to Gene Expression Data.2001
  • 4Han J,Kamber M.Data Mining:Concepts and Techniques.In:The Morgan Kaufmann Series in Data Management Systems,Jim Gray,Series Editor Morgan Kaufmann Publishers,ISBN 1-55860-489-8,August 2000
  • 5Kohonen T.Self-Organization and Associative Memory.Spring-Verlag,Berlin,1984
  • 6Beyer K,Goldstein J,Ramakrishnan R,et al.When is the nearest neighbor meaningful? Lecture Notes in Computer Science,1999,1540:217-235
  • 7Hastie T,Tibshirani R,Eisen M B,et al.Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns.Genome Biology,2000,2(1)
  • 8Efron B,Tibshirani R,Goss V.et al.Microarrays and Their Use in a Comparative Experiment:[Tech.report].Stanford University,2000
  • 9Tang Chun,Zhang Aidong.An Iterative Strategy for Pattern Discovery in Multi-dimensional Data Sets.In:11th International Conference on Information and Knowledge Management (CIKM 2002).McLean,VA,November 2002
  • 10Tang Chun,Zhang Li,Zhang Aidong,Ramanathan M.Interrelated Two-way Clustering:An Unsupervised Approach for Gene Expression Data Analysis.In:Proc.of 2nd IEEE International Symposium on Bioinformatics and Bioengineering.Bethesda,MD.November 2001

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部