摘要
在基因表达数据的探索性聚类分析中,聚类个数的确定是决定聚类质量的关键因素。许多聚类有效性评价指标和方法可用于PAM聚类算法。该文讨论适合于PAM算法的7种常用评价指标和方法,采用4种不同聚类结构特征的基因表达数据对它们的性能进行实验比较。结果表明,系统演化方法和稳定性方法估计聚类个数的性能最好,正确率分别为100%与90%。
Estimation of clusters number is a crucial problem for applying robust Partitioning Around Medoid(PAM) clustering algorithm to gene expression data. This paper discusses seven methods of cluster validation for PAM algorithm and gives their experimental comparison on estimation of the clusters number, using simulated and real gene expression data that hold four different types of cluster structures. Experimental results show that the system evolution method and stability-based method give estimation accuracy of 90% and 100%.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第9期198-199,202,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60574039,60371044)
关键词
聚类有效性
聚类个数估计
聚类分析
基因表达数据
cluster validation
clusters number estimation
cluster analysis
gene expression data