摘要
为解决肿瘤基因表达谱数据后续研究需要完整数据矩阵的问题,针对包含缺失点的数据集。提出基于矩阵填充(matrix completion)与模糊C均值(fuzzy c-means algorithm,FCM)相结合的缺失点估计方法(FCM_MC)。该方法充分利用肿瘤基因表达谱数据的冗余信息,通过模糊C均值聚类得到具有良好的低秩特性的基因语义片段,再利用矩阵填充方法分别对每个语义片段进行缺失点的重建。在不同数据集上进行实验,与传统缺失点估计算法比较。实验表明FCM_MC算法在缺失数据估计准确度和类结构保持度上效果得到有效提升,同时运行效率较高。
To solve the problem that the research of tumor gene expression data needs a complete data matrix,a missing value estimation method(FCM_MC) based on matrix completion(MC) and fuzzy c-means algorithm(FCM) is proposed for matrices contain missing values.The method makes full use of the redundancy information of tumor gene expression data,the low rank genetic semantics matrices are obtained by fuzzy c-mean clustering method.Then matrix completion theory was used to estimate the missing values of every semantics matrices.After the estimation of different data sets,our proposal with tradition missing value estimation algorithm were compared.Experimental results show the improvement of our method on missing value estimation accuracy and structure of class preserving accuracy with suitable efficiency.
出处
《科学技术与工程》
北大核心
2017年第7期63-68,89,共7页
Science Technology and Engineering
基金
国家自然科学基金(51365017
61305019)
江西省教育厅科技计划(GJJ150680)资助
关键词
矩阵填充
模糊C均值
低秩
基因语义
缺失值估计
matrix completion
fuzzy C-means
low rank
genetic semantic
missing value estimation