摘要
目的 本文探讨基因表达数据聚类分析结果的评价方法 ,提供一种最佳聚类结果的判别准则。方法 从数据结构 (内部信息 )和功能分类 (外部信息 )两个方面对聚类结果进行评判。即一方面 ,采用Entropy(信息熵 )评判法 ,考察聚类结果与部分已知功能基因分类的符合程度 ;另一方面 ,采用adjust- FOM评价法 ,从数据结构的本身进行评价。我们综合两种方法得到一种新的评价方法 ,并称此方法为Entropy- FOM评价方法。结果 将该方法应用于Lyer的血清数据集和Ferea的酵母数据集对聚类分析结果进行了评价 ,给出了六种聚类方法的adjust- FOM图和Entropy- FOM图。结论 通过大量计算结果提示 。
Objective Many cluster algorithms have been used to analyze gene expression data. However, little guidance is proposed to evaluate and choose these algorithms. In this study, our purpose is to establish a systematic framework for selecting the best clustering algorithm and provide an evaluation method for clustering analysis of gene expression data.Methods Based on data structure (internal information) and function classification (external information), the evaluation of gene expression data analysis is carried out by two approaches. Firstly, in order to examine the predictive power of clustering algorithms, Entropy is used to measure the consistency between the clustering results from different algorithms and the known and validated functional classifications (the external classification information). Secondly, a modified method of figure of merit (adjust -FOM) is used as internal assessment method. In this method, one clustering algorithm is used to analyze all data but one experimental condition, the remaining condition is used to assess the predictive power of the resulting clusters.Results In this study, we propose a method based on entropy and figure of merit (FOM) to access the results obtained by different algorithms. Six clustering algorithms were evaluated using three gene expression data sets (the Lyer's Serum Data Sets, the Ferea's Saccharomyces Cerevisiae Data Set).Conclusion According to the curve of adjust -FOM and Entropy -FOM, Both SOM and Fuzzy clustering methods show the highest ability to cluster on the three data sets.
出处
《中国卫生统计》
CSCD
北大核心
2002年第6期332-335,共4页
Chinese Journal of Health Statistics