摘要
针对基因表达数据噪声大、冗余性较高,传统的NMF算法在基因表达数据聚类中的低效性问题,提出了一种平滑的l_0范数约束的β散度的矩阵分解与K-means相结合的聚类算法,应用到基因表达数据当中;将平滑的l_0范数约束引入到基于β散度的矩阵分解的目标函数中,从而提取有用特征信息用于聚类;最后通过实验比较,改进的算法平均聚类精度达到70%,比传统的NMF聚类算法精度提高了11%,聚类效果相较其他方法显著。
Based on high noise and redundancy of gene expression data and that traditional NMF algorithm is inefficient in the clustering of gene expression data, a new clustering method of beta divergence matrix decomposition under the constraint of smooth l_0 norm and the combination K-means is presented,and the new clustering method is applied to gene expression data. The smooth l_0 norm is introduced into the objective function of matrix decomposition based on beta divergence so as to extract the useful feature information for the clustering.Finally,compared by experiments,the average clustering accuracy of the improved algorithm reaches 70 percent,which is 11 percent higher than that of the traditional NMF clustering algorithm,and clustering effect is more significant than other methods.
作者
崔建
游春芝
CUI Jian, YOU Chun - zhi(Basic Medicine Department, Fenyang College, Shanxi Medical University, Shanxi Luliang 032200, Chin)
出处
《重庆工商大学学报(自然科学版)》
2018年第2期31-35,共5页
Journal of Chongqing Technology and Business University:Natural Science Edition
关键词
基因表达数据
β散度
聚类
矩阵分解
gene expression data
beta divergence
clustering
matrix decomposition