期刊文献+

大肠杆菌基因组中重叠基因注释的机器学习优化方法 被引量:3

Machine Learning Optimization Method for Overlapping Genes Annotation in Escherichia coli Genomes
下载PDF
导出
摘要 细菌基因组上存在着大量的重叠基因,这不但缩减基因组尺寸,增加对遗传信息的有效利用,而且参与转录及转录后水平的调控。目前重叠基因的形成原因尚不清楚,缺少预测重叠基因是否存在的特征信息,不利于对重叠基因的注释。本研究通过机器学习中的卷积神经网络算法对基因相关区域进行扫描,发现基因编码区前54 bp的区域可以作为判定重叠基因的标记信息,并采用支持向量机算法确证以上预测结果的准确性。通过对卷积神经网络模型的训练与优化,成功构建卷积神经网络模型,并用于大肠杆菌基因组中重叠基因的注释,对重叠基因的研究有重要意义。已训练好的模型和使用方法已经发布于Git Hub,具体内容参看以下网址:https://github.com/breadpot/Convolutional_Neural_Network_Bacteria_overlapping_genes_prediction。 There are many overlapping genes in the bacterial genomes,which not only shorten the genomes size for genetic information usage, but also participate in the genes regulation at the transcriptional and post-transcriptional levels. At present,some features of overlapping genes have been discovered,and the reasons for the formation of overlapping genes are still unclear,leading to the lack of regional information of overlapping gene generation, which is not conducive to the annotation of overlapping genes. In this study,the convolutional neural network algorithm in machine learning was used to scan the relevant regions of genes. It was concluded that the first 54 bp of the coding region contained the information,leading to the generation of overlapping genes. The accuracy of the regional information was identified by the support vector machine algorithm. Through the training and optimization of the convolutional neural network model, a convolutional neural network model was successfully constructed for annotation of overlapping genes in Escherichia coli genomes, which was of great significance to the study of overlapping genes. Our trained models and manuals were loaded on: https://github. com/breadpot/Convolutional_Neural_Network_Bacteria_overlapping_genes_prediction.
作者 杜明伦 黄君君 马香 唐燕琼 刘柱 DU Ming-Lun;HUANG Jun-Jun;MA Xiang;TANG Yan-Qiong;LIU Zhu(Department of Bioteehnology,Institute of Tropical Agriculture and Forestry,Hainan University,Haikou 570228,China)
出处 《中国生物化学与分子生物学报》 CAS CSCD 北大核心 2018年第8期861-867,共7页 Chinese Journal of Biochemistry and Molecular Biology
基金 科技部国际合作专项(No.2015DFR31060) 海南省重点研发计划(No.ZDYF2017020) 海南省自然科学基金(No.317015) 国家自然科学基金项目(No.31560021 31772887)资助~~
关键词 重叠基因 机器学习 卷积神经网络 功能注释 支持向量机 overlapping genes machine learning convolutional neural network functional annotation support vector machine
  • 相关文献

参考文献2

二级参考文献2

共引文献4

同被引文献21

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部