摘要
针对现有差异甲基化区域DMRs识别方法中过度删除显著性弱的甲基化位点、DMRs长度受限以及不能直接处理多类的问题,提出了一种利用滑动窗口和KNN算法识别不同类别间DMRs的算法.算法先通过滑动窗口结合KNN分类器筛选候选区域,再根据误差率合并候选区域得到DMRs.真实数据上的实验表明,算法的分类性能、聚类指数明显优于对照算法,扩展了对照的Ong算法识别的DMRs长度,并能发现Ong算法未发现的DMRs.
In view of the shortcomings of the existing methods for identifying differentially methylated regions(DMRs),such as over deletion of sites that significance are weaker,region length limitation and can't be directly processed by the multi-class.An algorithm of identifying DMRs based on sliding window and k-nearest neighbor(KNN)is proposed.In this method,candidate regions are obtained using sliding windows and KNN,and it merges candidate regions to get DMRs.Through real data simulation results demonstrate the method is superior to control method,such as classification performance,cluster index,the DMRs length of the control methods of Ong is extended and find some DMRs that can't be found in control algorithm of Ong.
出处
《杭州电子科技大学学报(自然科学版)》
2016年第4期35-39,共5页
Journal of Hangzhou Dianzi University:Natural Sciences
基金
国家自然科学基金资助项目(60903086)
关键词
差异甲基化区域
滑动窗口
KNN分类器
多类问题
聚类指数
differentially methylated regions
slide window
k-nearest neighbor classifier
multi-class problem
cluster index