摘要
癌症基因表达数据的聚类分析可以为癌症的早期诊断和精确的癌症亚型分型提供依据。针对癌症基因表达数据的特点,提出一种称为OMB(Override Matrix Bicluster)的双向聚类算法。OMB算法分别在基因表达数据矩阵的行和列上搜索低于阈值的行和列,用删除添加算法产生一个子矩阵;构建与基因表达矩阵大小相同的覆盖矩阵,标识矩阵中上一次迭代产生的子矩阵的位置;在标识出来的矩阵中,重复贪婪迭代搜索找到K个聚类结果。Matlab实验结果表明OMB算法对具有重叠结构的癌症基因表达数据具有更好的聚类效果。
Cluster analysis on cancer gene expression data provides the basis for cancer early diagnosis and accurate classification of cancer subtypes.For the characteristics of cancer gene expression data,a biclustering algorithm named OMB(Over- ride Matrix Bicluster) is presented.In OMB algorithms,it searches the ones below the threshold values in the rows and col- umns of gene expression data matrix respectively, uses delete add algorithm to generate a sub-matrix,builds a covering matrix that is the same size as gene expression matrix,identifies the location of the sub-matrix which is generated by last iteration, finds K clustering results through greedy iterative search.Matlab experimental results show that the OMB algorithm has better clustering results on cancer gene expression data with overlapping structure.
出处
《计算机工程与应用》
CSCD
北大核心
2011年第28期237-240,共4页
Computer Engineering and Applications
基金
河北省教育厅自然科学研究计划资助项目(No.2009339)