期刊文献+

基于网络约束双聚类的癌症亚型分类 被引量:5

Network Regularized Bi-Clustering for Cancer Subtype Categorization
下载PDF
导出
摘要 癌症亚型识别在肿瘤异质性分析中具有重要意义.双聚类可以在大规模基因表达数据的基因和样本维度上同时进行聚类分析,发现部分样本在部分基因子集上表达相似的双聚类簇,进而发现相应的癌症亚型,为癌症的精准基因治疗等提供了重要的信息.双聚类算法通过结合基因相互作用网络数据,可进一步提高癌症亚型分类的准确度,但已有整合基因网络的双聚类算法通常仅基于基因的度加权选择基因,易受网络中噪声互作的干扰和缺失互作的误导.为此,该文提出了一种基于基因互作网络正则化的双聚类算法(Network Regularized Bi-Clustering algorithm, NetRBC). NetRBC首先通过最小化聚类簇上的均方残差分别求取癌症基因表达数据矩阵上的基因簇和样本簇指示矩阵;然后利用基因网络和基因簇指示矩阵构建图正则项;最后将此正则项结合到基于均方残差的非负矩阵分解中,约束基因簇和样本簇矩阵的协同分解,以期提高癌症亚型分类的精度.在多个癌症基因表达数据上的实验结果表明,NetRBC比已有相关方法能够更准确地区分癌症亚型. Cancer subtype identification is crucial for understanding tumor heterogeneity.Existing methods for identifying cancer subtypes have primarily focused on utilizing traditional clustering algorithms (such as k -means and hierarchical clustering) to cluster gene expression data and thus to identify subtypes.These traditional approaches, however, separately group the data from genes or samples dimension only, so they cannot discover the patterns that similar genes exhibit similar behaviors only over a subset of conditions (or samples). Bi-clustering can simultaneously group large scale gene expression data from sample and gene dimensions, and find out bi-clusters that relevant samples exhibit similar gene expression profiles over a subset of genes, and thus to identify corresponding cancer subtypes.The discovered bi-clusters bring insights for categorizing cancer subtypes and precise gene treatments.Incorporating the information of gene - gene interaction networks can further improve the quality of the discovered bi-clusters.However, current efforts generally use the networks to weight and select genes.They are often interfered by noisy interactions and misled by missing interactions.There are many types of bi-clusters, including constant bi-cluster, constant row bi-cluster, constant column bi-cluster, coherent values additive bi-cluster and coherent value multiplicative bi-cluster. To address these limitations and explore multiple types of bi-clusters, in this paper, we introduce a gene - gene interaction Network Regularized Bi-Clustering algorithm (NetRBC) based on the Semi-Nonnegative Matrix Tri-Factorization (SNMTF).NetRBC firstly integrates the mean square residuals into SNMFT, and optimizes the gene - cluster and sample - cluster indicator matrices via minimizing the sum-squared loss of the discovered bi-clusters.Next, it constructs a graph regularization term by using the gene networks and gene - cluster indicator matrix.The core idea of the regularization term is that if a pair of genes interact with each other, these genes may co-regulate the production of one cancer subtype, so we except that these genes can be grouped into the same bi-clusters.After that, NetRBC incorporates the regularization term into a sum-squared loss based SNMTF to guide the collaborative factorization and thus to pursue gene - cluster indicator matrix and sample - cluster indicator matrix, and thus to improve the accuracy of cancer subtypes categorization.At the same time, NetRBC uses a regularization parameter to control the contribution of gene - gene interaction network.We also give an optimization technique to optimize the gene - cluster and sample - cluster indicator matrices, which uses the multiplicative updating technique to alternatively optimize one variable, while fixing the other variables, until convergence.We conduct experiments on six cancer gene expression datasets with known subtypes to comparatively study the performance of NetRBC.We further test NetRBC on two large - scale cancer gene expression datasets from The Cancer Genome Atlas (TCGA) project and use the clinical features of patients to evaluate the performance, since the true subtypes of these samples belonging to are unknown.Extensive experimental results show that NetRBC can better group patients into subtypes than competitive comparing methods, and the proposed network regularization term indeed significantly improves the cancer subtype categorization accuracy.
作者 王星 王峻 余国先 郭茂祖 WANG Xing;WANG Jun;YU Guo-Xian;GUO Mao-Zu(College of Computer and Information Science, Southwest University, Chongqing 400715;College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044;Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing 100044)
出处 《计算机学报》 EI CSCD 北大核心 2019年第6期1274-1288,共15页 Chinese Journal of Computers
基金 国家自然科学基金(61873214,61872300,61741217,61871020,61571163,61532014) 重庆市基础与前沿研究项目(cstc2018jcyjAX0228,cstc2016jcyjA0351)资助~~
关键词 双聚类 均方残差 非负矩阵分解 癌症亚型 基因网络 bi-clustering sum-squared residue nonnegative matrix factorization cancer subtypes gene networks
  • 相关文献

同被引文献43

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部