期刊文献+

基于BERT与Text-CNN的抗菌肽识别方法 被引量:2

An antibacterial peptides recognition method based on BERT and Text-CNN
原文传递
导出
摘要 抗菌肽(antimicrobial peptides,AMPs)广泛存在于生命体中,是一种具有广谱抗菌活性、免疫调节功能的小分子多肽。抗菌肽不易产生耐药性,适用范围广,具有极大的临床价值,是传统抗生素的有力竞争者。识别抗菌肽是抗菌肽研究领域中的重要研究方向,湿实验法在进行大规模抗菌肽识别时存在成本高、效率低、周期长等难点,计算机辅助识别法是抗菌肽识别手段的重要补充,如何提升准确率是其中的关键问题。蛋白质序列可以被近似地看作是由氨基酸组成的语言,运用自然语言处理(natural language processing,NLP)技术可能提取到丰富的特征。本文将自然语言处理领域中的预训练模型BERT和微调结构Text-CNN结合,对蛋白质语言进行建模,提供了开源可用的抗菌肽识别工具,并与已发表的5种抗菌肽识别工具进行了比较。结果表明,优化“预训练-微调”策略带来了准确率、敏感度、特异性和马修相关系数的整体提升,为进一步研究抗菌肽识别算法提供了新思路。 Antimicrobial peptides(AMPs)are small molecule peptides that are widely found in living organisms with broad-spectrum antibacterial activity and immunomodulatory effect.Due to slower emergence of resistance,excellent clinical potential and wide range of application,AMP is a strong alternative to conventional antibiotics.AMP recognition is a significant direction in the field of AMP research.The high cost,low efficiency and long period shortcomings of the wet experiment methods prevent it from meeting the need for the large-scale AMP recognition.Therefore,computer-aided identification methods are important supplements to AMP recognition approaches,and one of the key issues is how to improve the accuracy.Protein sequences could be approximated as a language composed of amino acids.Consequently,rich features may be extracted using natural language processing(NLP)techniques.In this paper,we combine the pre-trained model BERT and the fine-tuned structure Text-CNN in the field of NLP to model protein languages,develop an open-source available antimicrobial peptide recognition tool and conduct a comparison with other five published tools.The experimental results show that the optimization of the two-phase training approach brings an overall improvement in accuracy,sensitivity,specificity,and Matthew correlation coefficient,offering a novel approach for further research on AMP recognition.
作者 徐小放 杨春德 舒坤贤 袁新普 李默程 朱云平 陈涛 XU Xiaofang;YANG Chunde;SHU Kunxian;YUAN Xinpu;LI Mocheng;ZHU Yunping;CHEN Tao(The School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;State Key Laboratory of Proteomics,Beijing Proteome Research Center,National Center for Protein Sciences(Beijing),Institute of Lifeomics,Academy of Military Medical Sciences,Academy of Military Sciences,Beijing 102206,China;Chongqing Key Laboratory on Big Data for Bio-Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Department of General Surgery,First Medical Center,Chinese PLA General Hospital,Beijing 102206,China;State Key Laboratory of High Performance Computing,Institute for Quantum Information,College of Computer,National University of Defense Technology,Changsha 410073,Hunan,China)
出处 《生物工程学报》 CAS CSCD 北大核心 2023年第4期1815-1824,共10页 Chinese Journal of Biotechnology
基金 国家重点研发计划(2021YFA1301603)。
关键词 蛋白质 抗菌肽 语言模型 预训练 protein antibacterial peptides language model pre-training
  • 相关文献

参考文献1

共引文献5

同被引文献44

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部