CIS:一种基于迭代扩张的微阵列数据聚类算法

CIS:An Iterative Spread-based Algorithm for Clustering Micro-array Data

下载PDF

导出

摘要 DNA微阵列技术使同时监测成千上万的基因表达水平成为可能。直接把传统聚类算法用于高维基因表达数据分析会受到"维难"的困扰。特征转换和特征选择是两种常用的降维方式,但前者产生的新特征难以用原来的领域知识解释,后者通常会丢失信息。另外,传统的聚类算法通常由用户指定聚类参数,参数设置不同对聚类结果有很大的影响。针对上述问题,本文提出了一种新的基于迭代扩张的微阵列数据聚类算法-CIS。它不采用特征转换和特征选择的方式,并自动确定聚类参数。CIS反复用最新得到的样本聚簇得到新的聚类基因,然后以新的基因聚簇为特征重新聚类样本,逐步求精,最终的结果容易解释且避免了信息的丢失。该方法降低了由于用户缺少领域知识引起的实验误差。CIS算法被应用于两个真实的微阵列数据集,实验结果证实了算法的有效性。 DNA Micro-array technique makes it possible to simultaneously monitor the expression levels of tens of thousands of genes. The traditional clustering methods will suffer from the curse of dimensionality when directly applied to Micro-array data. The two common dimensionality reduction methods, i.e. feature transformation and feature selection, are unsuitable for the analysis of Micro-array data, since the former generates the new features difficult to interpret and the latter misses some information. Besides, most traditional clustering algorithms need the user-specific parameters, which may result in quite different results. In this paper, we present an iterative spread-based algorithm, namely CIS, for clustering Micro-array data, which selects threshold automatically. Instead of feature selection and feature transformation, in a progressively refining manner, CIS repeatedly partitions the genes with the new-generated sample clusters as features, and then partitions the samples with the new-generated gene clusters as features. The algorithm is applied to two real gene Micro-array data sets. Experiment results confirm its effectiveness and efficiency.

作者王晓明印莹

机构地区辽宁科技大学电信学院东北大学

出处《计算机科学》 CSCD 北大核心 2007年第8期171-176,共6页 Computer Science

关键词微阵列聚类降维 Micro-array,Clustering,Dimensionality reduction

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Brazma A,Vilo J.Minireview:Gene expression data analysis.Federation of European Biochemical societies,June 2000,480:17-24
2D'haeseleer P,Liang S,Somogyi R.Genetic network inference:from co-expression clustering to reverse engineering.Bioinformatics,2000,16(8):707-726
3Sharan R,Elkon R,Shamir R.Clustering Analysis and its Application to Gene Expression Data.2001
4Han J,Kamber M.Data Mining:Concepts and Techniques.In:The Morgan Kaufmann Series in Data Management Systems,Jim Gray,Series Editor Morgan Kaufmann Publishers,ISBN 1-55860-489-8,August 2000
5Kohonen T.Self-Organization and Associative Memory.Spring-Verlag,Berlin,1984
6Beyer K,Goldstein J,Ramakrishnan R,et al.When is the nearest neighbor meaningful? Lecture Notes in Computer Science,1999,1540:217-235
7Hastie T,Tibshirani R,Eisen M B,et al.Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns.Genome Biology,2000,2(1)
8Efron B,Tibshirani R,Goss V.et al.Microarrays and Their Use in a Comparative Experiment:[Tech.report].Stanford University,2000
9Tang Chun,Zhang Aidong.An Iterative Strategy for Pattern Discovery in Multi-dimensional Data Sets.In:11th International Conference on Information and Knowledge Management (CIKM 2002).McLean,VA,November 2002
10Tang Chun,Zhang Li,Zhang Aidong,Ramanathan M.Interrelated Two-way Clustering:An Unsupervised Approach for Gene Expression Data Analysis.In:Proc.of 2nd IEEE International Symposium on Bioinformatics and Bioengineering.Bethesda,MD.November 2001

1刘琰,尹美娟,常斌.网络编程技术课程实践教学改革初探[J].中国电子教育,2016(3):30-35. 被引量：1
2魏峻.一种有效的支持向量机参数优化算法[J].计算机技术与发展,2015,25(12):97-100. 被引量：17
3陈春燕,张久彪.双层结构的基因表达数据聚类算法[J].福建电脑,2009,25(4):91-91.
4汪军,王传玉.模糊聚类算法在痕迹图像分割中的应用[J].南通大学学报（自然科学版）,2010,9(1):19-23. 被引量：1
5盛莉,邹开其,邓冠男.基于网格和密度的模糊C均值聚类初始化方法[J].计算机应用与软件,2008,25(3):22-23. 被引量：9
6刘青,周鹏.基于强泛化神经网络的大规模基因表达数据分析[J].计算机工程,2005,31(3):189-191. 被引量：1
7于化龙,高尚,赵靖,秦斌.基于过采样技术和随机森林的不平衡微阵列数据分类方法研究[J].计算机科学,2012,39(5):190-194. 被引量：9
8俞辉,裴振奎,陈继东.一种改进的蚁群聚类算法[J].郑州大学学报（理学版）,2010,42(3):59-62. 被引量：2
9马尽文,邓明华.第五讲生物医学信息处理——DNA微阵列数据在医学中的应用[J].物理,2005,34(5):371-380.
10段宝彬,韩立新,谢进.基于堆叠稀疏自编码的模糊C-均值聚类算法[J].计算机工程与应用,2015,51(4):154-157. 被引量：9

计算机科学

2007年第8期

浏览历史

内容加载中请稍等...

CIS:一种基于迭代扩张的微阵列数据聚类算法

参考文献11

相关作者

相关机构

相关主题

浏览历史