摘要
双聚类作为一种无监督的学习方法,其作用是对基因表达数据进行分析.为了获取较大容量的双聚类簇,弥补传统的双聚类方法在基因表达数据一致波动性方面的不足,引入粗糙集的上、下近似集概念,将粗糙集理论运用到模糊双聚类算法中,将粗糙上、下近似集与加权均方残差相结合,得到新的粗糙均方残基,进而提出一种基于粗糙均方残基的模糊双聚类算法.针对基因表达数据集,首先进行缺失值填补;其次,用非负矩阵分解算法对基因数据集进行降维;最后,计算数据矩阵的粗糙均方残基,结合综合评判度量函数与贴近度原则对矩阵的行列进行删除和添加,得到容量更大的双聚类结果.实验结果表明,该模糊双聚类算法是有效的.
Biclustering as an unsupervised learning method can analyze gene expression data.However,some traditional biclustering methods have the shortcoming of consistent volatility for gene expression data.To solve this problem,and obtain large capacity clusters of biclustering,the upper and lower approximation of rough set was introduced in this paper,and the rough set theory was applied into fuzzy biclustering algorithm.By combining upper and lower approximation with weighted mean square residual,a novel rough mean square residue was defined.Then an improved fuzzy biclustering algorithm based on rough mean square residue was proposed.For gene expression dataset,the missing values were filled up firstly.A factorization algorithm of non-negative matrix was used to reduce dimension of gene dataset.And the rough mean square residue of data matrix was calculated.Finally,through integrating a comprehensive evaluation measure function and nearness degree,the rows and columns of matrixes were deleted or added in order to obtain a larger of biclustering results.Experimental results show that the proposed fuzzy biclustering algorithm is efficient.
出处
《河南师范大学学报(自然科学版)》
CAS
北大核心
2017年第5期93-100,共8页
Journal of Henan Normal University(Natural Science Edition)
基金
国家自然科学基金(61402153
61602158)
中国博士后科学基金项目(2016M602247)
河南省高等学校重点科研项目计划(14A520069)
关键词
粗糙集
粗糙均方残基
双聚类
rough set
rough average square residue
biclustering