期刊文献+

ILLUMINA Golden Gate DNA甲基化芯片的KL-FCM聚类分析

KL-FCM clustering analysis inIllumina golden gate DNA methylation microarrray
下载PDF
导出
摘要 DNA甲基化作为一种重要的表观遗传修饰,其甲基化水平被发现与疾病的发生发展密切相关,对其进行聚类分析有希望发现新的疾病亚型并建立有效的疾病预测预后方法。传统的聚类分析方法之一模糊C-均值(FCM:Fuzzy C-means)适用于特征空间呈球形或椭球形分布的场景,缺乏普适性。而Illumina Golden Gate平台通过计算基因的各甲基化位点的甲基化百分比描述其甲基化程度,其值位于(0,1)之间,服从混合贝塔分布,不能直接采用FCM进行聚类分析。鉴于此,本文提出基于KL特征测度的KL-FCM聚类算法,采用各样本间的K-L距离作为样本划分时的度量准则。最后,本文基于KL-FCM算法实现IRIS测试数据集和基因的DNA甲基化水平数据的聚类分析。实验结果表明该方法可以以更低的计算负荷获得优于k-均值(k-means)和传统FCM的分类效果。 DNA methylation is an important epigenetic modification, which has been found to be closely related to theoccurrence and development of disease. Clustering analysis of DNA methylation is expected to find novel subtype ofdisease or novel method of prediction and prognosis. Fuzzy C-means(FCM) is one of the common clustering methods.However it is more suitable in the condition that the feature space follows spherical or elliptical distribution, whichmakes it lack in universality. Illumina Golden Gate platform describes the methylation level based on the methylationpercentage of each locus in each gene, and it is in(0,1), which follows beta mixture distribution. Thus we can notadopt FCM for clustering directly. This paper introduces the KL-FCM clustering method, which calculates the K-Ldistance of samples as partition measure. The KL-FCM is used to cluster the IRIS test dataset and some DNAmethylation profile data. The validation results show that KL-FCM,with less computational load, can get betterclustering performance than k-means and traditional FCM clustering methods.
出处 《生物信息学》 2014年第2期106-109,共4页 Chinese Journal of Bioinformatics
基金 中国博士后基金面上项目(2012M511336 2012M511335) 江苏省大学生创新创业训练计划 霍英东教育基金会青年教师基金(121066)资助
关键词 模糊C均值 ILLUMINA DNA甲基化芯片 K-L距离 Fuzzy C-means DNA methylation expressionmicroarray K-L distance
  • 相关文献

参考文献14

二级参考文献16

共引文献208

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部