摘要
N^4-甲基胞嘧啶(N^4-methylcytosine,4mC)是一种重要的表观遗传修饰,在DNA的修复、表达和复制中发挥重要作用。准确鉴定4mC位点有助于深入研究其生物学功能和机制,由于4mC位点的实验鉴定即耗时又昂贵,特别是考虑到基因序列的快速积累,迫切需要补充有效的计算方法。因此,提供一个快速、准确的4mC位点在线预测平台十分必要。目前,还未见对构建必要的预测模型所需的不同特征的机器学习(machine learning,ML)方法进行全面的分析和评估。我们构建多组特征集,并且采用5种ML方法 (如随机森林,支持向量机,集成学习等),提出一种称为'DNA4mcEL'的预测方法。在随机10折交叉验证测试下与现有的预测器相比,DNA4mcEL预测C. elegans、D. melanogaster、A. thaliana、E. coli、G. subterraneus、G. pickeringii 6个物种的精度均有提高。基于本方法的预测器DNA4mcEL在这项任务中显著优于现有的预测器。我们希望通过这个综合调查和建立更准确模型的策略,可以作为激发N^4-甲基胞嘧啶预测计算方法未来发展的有用指南,加快新N^4-甲基胞嘧啶的发现。DNA4mcEL的独立版本可以从https://github.com/kukuky00/DNA4mcEL.git免费获得。
N^4-methylcytosine(4 mC)is an important epigenetic modification that plays an important role in DNA repair,expression and replication.The accurate identification of 4 mC sites facilitates in-depth study of its biological functions and mechanisms,as the experimental identification of 4 mC sites is timeconsuming and expensive,especially considering the rapid accumulation of gene sequences.There is an urgent need to provide effective calculation methods with a fast and accurate online prediction platform for nucleosomes.There is currently no comprehensive analysis and evaluation of the different features and machine learning(ML)methods for the predictive models.We construct multiple feature sets and propose a prediction method called'DNA 4 mcEL'using five ML methods(such as random forest,support vector machine,ensemble learning,etc.).Compared with the existing predictors,the accuracy of DNA 4 mcEL in predicting C.elegans,D.melanogaster,A.thaliana,E.coli,G.subterraneus and G.pickeringii was improved in random 10-fold cross-validation test.The DNA 4 mcEL predictor based on this method was significantly better than the existing predictor of i DNA 4 mC in this task.We hope that this comprehensive survey and proposed strategy for building more accurate models can serve as a useful guide to the future development of N^4-methylcytosine on-site prediction calculations and accelerate the discovery of new N^4-methylcytosine.A standalone version of DNA4 mcEL is free and available from https://github.com/kukuky00/DNA4 mcEL.git.
作者
龚浩
樊永显
GONG Hao;FAN Yong-Xian(College of Computer and Information Security,Guilin University of Electronic Technology,Guilin 541004,Guangxi,China)
出处
《中国生物化学与分子生物学报》
CAS
CSCD
北大核心
2019年第6期633-647,共15页
Chinese Journal of Biochemistry and Molecular Biology
基金
国家自然科学基金项目(No.61762026,No.61462018)
广西自然科学基金(No.2017GXNSFAA198278)
桂林电子科技大学研究生教育创新计划资助项目(No.2018YJCX47)
广西可信软件重点实验室(No.kx201403)
广西高校计算机图像与图形智能处理重点实验室(No.GIIP201502)资助~~