摘要
提出了一种用于基因表达数据的无参数聚类算法。该算法把多维数据的模糊聚类方法与CTWC相结合,并引入基于范数的方法进一步对该方法加以改进和论证。将该算法应用于真实的结肠癌基因表达数据集,确定了含8个基因的特征基因组合,该特征基因组合不仅达到了90%左右的结肠癌样本识别率,还能鉴别结肠癌样本的亚型。实验结果充分验证了这种算法的可行性。
This paper proposed a new non-parametric algorithm for clustering gene expression data. This algorithm combined the fuzzy clustering of multi-dimensional data with CTWC. Furthermore, it introduced the norm-based method to improve and prove reasonable. The colon tumor gene expression dataset was analyzed and the interesting combination of 8 genes is discovered, which could identify the colon tumor samples whih 90% accuracy as well as the subtypes of the colon tumor. Experiments were proved the feasibility of the method.
出处
《计算机应用》
CSCD
北大核心
2005年第6期1388-1391,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60273079
60473074)
关键词
基因表达数据
双向聚类
模糊聚类
范数
无参数聚类
gene expression data
two-way clustering
fuzzy clustering
norm
non-parametric clustering