一种基于粗糙均方残基的模糊双聚类方法被引量：4

A Fuzzy Biclustering Approach Based on Rough Average Square Residue

下载PDF

导出

摘要双聚类作为一种无监督的学习方法,其作用是对基因表达数据进行分析.为了获取较大容量的双聚类簇,弥补传统的双聚类方法在基因表达数据一致波动性方面的不足,引入粗糙集的上、下近似集概念,将粗糙集理论运用到模糊双聚类算法中,将粗糙上、下近似集与加权均方残差相结合,得到新的粗糙均方残基,进而提出一种基于粗糙均方残基的模糊双聚类算法.针对基因表达数据集,首先进行缺失值填补;其次,用非负矩阵分解算法对基因数据集进行降维;最后,计算数据矩阵的粗糙均方残基,结合综合评判度量函数与贴近度原则对矩阵的行列进行删除和添加,得到容量更大的双聚类结果.实验结果表明,该模糊双聚类算法是有效的. Biclustering as an unsupervised learning method can analyze gene expression data.However,some traditional biclustering methods have the shortcoming of consistent volatility for gene expression data.To solve this problem,and obtain large capacity clusters of biclustering,the upper and lower approximation of rough set was introduced in this paper,and the rough set theory was applied into fuzzy biclustering algorithm.By combining upper and lower approximation with weighted mean square residual,a novel rough mean square residue was defined.Then an improved fuzzy biclustering algorithm based on rough mean square residue was proposed.For gene expression dataset,the missing values were filled up firstly.A factorization algorithm of non-negative matrix was used to reduce dimension of gene dataset.And the rough mean square residue of data matrix was calculated.Finally,through integrating a comprehensive evaluation measure function and nearness degree,the rows and columns of matrixes were deleted or added in order to obtain a larger of biclustering results.Experimental results show that the proposed fuzzy biclustering algorithm is efficient.

作者孙林刘弱南张霄雨孙印杰宋黎明

机构地区河南师范大学计算机与信息工程学院计算智能与数据挖掘河南省高校工程技术研究中心

出处《河南师范大学学报（自然科学版）》 CAS 北大核心 2017年第5期93-100,共8页 Journal of Henan Normal University(Natural Science Edition)

基金国家自然科学基金(61402153 61602158) 中国博士后科学基金项目(2016M602247) 河南省高等学校重点科研项目计划(14A520069)

关键词粗糙集粗糙均方残基双聚类 rough set rough average square residue biclustering

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1赵兴旺,梁吉业.一种基于信息熵的混合数据属性加权聚类算法[J].计算机研究与发展,2016,53(5):1018-1028. 被引量：43
2杨大勇,葛琪,董永超,唐云龙,贺衬心.基于K均值聚类的光伏电站运行状态模式识别研究[J].电力系统保护与控制,2016,44(14):25-30. 被引量：17
3高洁,李群湛,汪佳,周阳.基于模糊聚类的NExT-ERA低频振荡类噪声辨识[J].电力系统保护与控制,2016,44(22):40-49. 被引量：8
4张敏,戈文航.基于概率计算的重叠双聚类算法[J].计算机工程与设计,2012,33(9):3579-3583. 被引量：3
5蒲国林,邱玉辉.一种基于变分贝叶斯的半监督双聚类算法[J].计算机应用研究,2015,32(8):2299-2301. 被引量：1
6林勤,薛云,林斯达,何明清.多目标人工蜂群双聚类算法在基因表达数据中的应用研究[J].华南师范大学学报（自然科学版）,2016,48(2):116-123. 被引量：2
7刘文华,梁永全,冯政.基于加权均方残差的改进双聚类算法[J].模式识别与人工智能,2016,29(6):519-526. 被引量：3
8李刚,苗夺谦,王睿智.一种基于粗糙遗传算法的缩放模式双聚类分析方法[J].计算机科学,2010,37(1):225-228. 被引量：3
9杨涛,骆嘉伟,王艳,吴君浩.基于马氏距离的缺失值填充算法[J].计算机应用,2005,25(12):2868-2871. 被引量：24
10郝胜轩,宋宏,周晓锋.一种基于双聚类的缺失数据填补方法[J].计算机应用研究,2015,32(3):674-678. 被引量：12

<12 >

二级参考文献117

1郑术蓉,史宁中,郭建华.含缺失数据线性模型的线性不等式约束EM算法[J].中国科学（A辑）,2005,35(2):231-240. 被引量：12
2闫雷鸣,孙志挥.一种基于二次互信息的双聚类算法[J].计算机工程与应用,2006,42(22):158-160. 被引量：4
3Cheng Y, Church G M. Biclustering of expression data[C]// Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology(ISMB 2000).La Jolla, CA, August 2000:93-103.
4Pawlak Z. Rough Sets[J]. International Journal of Information and Computer Sciences, 1982,11 : 145-172.
5Wang H, Wang W, Yang J, et al. Clustering by pattern similarity in large data sets[C]//Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. 2002:394-405.
6Bleuler S,Prelic A,Zitzler E. An EA framework for bielustering of gene expression data[C]//Proceedings of Congress on Evolutionary Computation. 2004 : 166-173.
7Chakraborty A, Maka H. Biclustering of gene expression data using genetic algorithm[C]//Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology(CIBCB '05). 2005:1-8.
8Divina F, Aguilar-Ruiz J S. Biclustering of expression data with evolutionary computation[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(5): 590-602.
9Mitra S, Banka H. Multi - objective evolutionary biclustering of gene expression data[J]. Pattern Recognition, 2006, 39 (12) : 2464-2477.
10Cho H, Dhillon I S. Co - clustering of human cancer microarrays using minimum sum-squared residue co-clustering[J].IEEE/ ACM Transactions on Computational Biology and Bioinformatics(2007 accepted,DOI 10. 11.9/TCBB. 2007. 70268).