摘要
聚类分析是从基因表达谱数据中提取生物医学信息的主要方法之一.针对传统谱聚类算法无法确定聚类个数的问题,提出一种改进的谱聚类算法并将其应用于基因表达谱聚类分析.首先用基因表达谱数据构造Laplacian矩阵,经特征值分解后得到相应的特征值和特征向量,用谱隙来描述相邻特征值的差值;然后通过寻找谱隙序列的最大值来确定聚类个数;最后从单位化的特征向量着手实现数据类别的划分.通过模拟数据与癌症数据的实验,证明了该文算法的有效性.
Cluster analysis is one of the main methods for extracting biomedical information from gene expression profile datas. To dispose of the problem that traditional spectral clustering algorithm could not determine the clustering number, an improved spectral clustering algorithm was proposed and it was applied in the cluster analysis of gene expression profile datas. This'~ algorithm first constructed normalized laplacian matrix with gene expression profile datas and obtained the corresponding eigenvalues and eigenvectors through eigenvalue decomposition. The difference between the adjacent eigenvalues was described with eigengap. Then, the clustering number was determined by searching the maximum of eigengap sequence. Finally, the clustering problem was solved by directly using unit eigenvector. The experiments on simulation data and cancer data demonstrated the validity of this algorithm.
出处
《安徽大学学报(自然科学版)》
CAS
北大核心
2012年第5期67-72,共6页
Journal of Anhui University(Natural Science Edition)
基金
国家自然科学基金资助项目(60772121)
安徽省自然科学基金资助项目(1208085MF93)
安徽大学"211工程"学术创新团队基金资助项目(KJTD007A)