摘要
目前,预测和分析DNA甲基化的研究所用的特征存在差异,缺少统一比较和评价,对非CpG岛序列缺少分析。为了进行统一比较和评价,作者集成DNA甲基化的主要特征,对CpG岛和非CpG岛序列进行分析,鉴别特征的重要性,并利用特征选择技术识别紧凑的特征子集。结果表明,序列模式和组蛋白修饰都是关联DNA序列甲基化的重要特征但并不独立,两者结合共同参与维持序列的甲基化模式;H3K4me3同时是CpG岛序列和非CpG岛序列最重要的组蛋白特征,具有最高的识别频率。本文标识DNA甲基化的重要特征在生物学上具有重要功能,可以作为线索来研究DNA甲基化、组蛋白修饰和基因调控间的关系。
In previous studies, different features have been used for prediction and analysis of DNA methylation and there is no unified comparison and evaluation of feature to date. Furthermore, few studies of nonCpG island sequence have been performed. This paper collected the primary features of published paper to perform a unified and extensive comparison, evaluated the feature importance for the sequence of CpG island and nonCpG island and utilized feature selection to identify the compact set of important feature. This analysis demonstrated that sequence pattern and histone modification are important feature for DNA methylation but not independent and both are responsible for determining and (or) maintaining the methylation pattern of genomic DNA. In detail, H3K4me3, which has the maximum selection frequency, is the important histone feature for CpG island and nonCpG island sequence. The identified key features of DNA methylation in this analysis have important biological function and may be landmarks for the mysteries among DNA methylation, histone methylaiton and gene regulation.
出处
《生物物理学报》
CAS
CSCD
北大核心
2012年第11期910-922,共13页
Acta Biophysica Sinica
基金
国家自然科学基金项目(60903086
11126090
61070127)~~
关键词
DNA甲基化
CPG岛
特征选择
分类
支持向量机
DNA methylation
CpG island
Feature selection
Classification
Support vector machine