摘要
基因表达数据聚类是分析基因之间共调控关系的重要手段.挖掘子空间中表达值存在差异但变化趋势保守的序列已成为基因表达数据聚类的主要研究内容之一.在N-同维趋势相似定义的基础上,提出了一个基因表达数据的高维聚类算法Gen-Cluster,将基因表达值转化为序列形式,采用无重复投影且无候选生成的序列模式挖掘策略自底向上挖掘N-同维趋势模式,并解决了OP-Cluster算法不能挖掘含有项集的序列模式问题,最终得到表达值变化趋势保守的基因序列形成的N-同维趋势簇.实验采用Breast Tumor和MicroRNA表达数据集,验证挖掘结果是有效的,且较OP-Cluster算法表现更高效率,并涵盖其结果.
Gene expression data clustering is an important task in gene co-regulated relation analysis. It is one of main research subjects in gene expression data clustering to mine genes which possess conserved tendency while take quite different expression values in subspace. Based on N-seme dimensional tendency similarity definition, a new gene expression data high dimensional dustering algorithm Gen-Cluster is proposed. Gen-Cluster first transforms gene expression value into sequence form, and then finds N-seme dimensional tendency pattern adopting non-duplicate-projection and non-candidategeneration strategy from bottom to upper side. It can deal with sequential patterns containing item set while OP-Cluster algorithm cannot. The experiments on real world data set from Breast Tumor and MicroRNA expression profile are used to evaluate the efficiency and effectiveness of Gen-Ctuster, the results suggest that C-en-Ctuster can generate satisfactory clustering results.
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2008年第2期135-146,共12页
Journal of Fudan University:Natural Science
基金
国家自然科学基金资助项目(60573093)
国家863计划基金资助项目(2006AA02Z329)
关键词
高维数据挖掘
聚类
基因表达数据
N-同维趋势相似
high dimensional data mining
clustering
gene express data
N-same dimensional tendency similarity