摘要
多肽,也可简称为肽,是α-氨基酸通过肽键连接在一起而形成的一类化合物,也是蛋白质水解的产物。它对人体的生长、发育、代谢有着重要的影响,部分多肽具有抗癌、抗菌、抗病毒、穿透细胞等特性,对于相应疾病的治疗具有重大意义。因此研究识别具有治疗特性的多肽方法至关重要,然而传统生物实验方法鉴定多肽耗时且昂贵,不适合处理高通量的序列数据。现有的基于机器学习的预测模型虽然大大提高了多肽的识别效率,但存在识别性能不足,泛化能力不够,以及一种模型只能有效识别特定的一种多肽等问题。针对以上问题,该文提出了一种通用深度学习模型DeepPEPred,该模型能有效预测多种不同的肽。在抗癌肽、抗菌肽、细胞穿透肽和结合肽四种不同肽数据集上进行十折交叉验证和独立测试,实验结果表明:与目前最新的方法PEPred-Suit相比,DeepPEPred在抗癌肽数据集上准确度提升了29.6%,MCC提升了59.7%;在抗菌肽、细胞穿透肽和结合肽三种数据集上准确度均提升了1.2%,MCC分别提升了2.3%、2.5%和2.4%,AUC分别提升了0.8%、0.3%和1.2%。
Polypeptides,also known as peptides,are a type of compounds that are formed by linkingα-amino acids together via peptide bonds,which are also the products of protein hydrolysis.It has an important influence on the growth,development and metabolism of human body.Some polypeptides have the properties of anticancer,antibacterial,antiviral and penetrating cells,so that they are quite important for the treatment of corresponding diseases.Therefore,it is vital to identify peptides with therapeutic properties.However,the experimental methods are time-consuming and expensive,and are not practically suitable for high-throughput sequence data.Although the existing machine learning-based models greatly improve the efficiency of peptide recognition,they are still limited in respect of performance and generalization ability.Moreover,most models are peptide-specific models that can only effectively identify a specific therapeutic peptide.To address these problems,we propose DeepPEPred(deep learning based method for PEptide prediction),a general deep learning-based computational model for peptide prediction.That is,it can effectively predict a variety of different peptides.Ten-fold cross-validation test and independent test were conducted on anticancer peptides(ACPs),anti-bacterial peptides(ABPs),cell penetrating peptides(CPPs)and surface-binding peptides(SBPs)datasets.Compared with PEPred-Suit,the latest predictive method of polypeptides,for ACPs,DeepPEPred improved the accuracy and MCC by 29.6%and 59.7%,respectively.For ABPs,CPPs and SBPs,DeepPEPred improved the accuracy by 1.2%,and MCC by 2.3%,2.5%and 2.4%,respectively,and AUC by 0.8%,0.3%and 1.2%,respectively.
作者
梁潇
吴昊
刘全中
LIANG Xiao;WU Hao;LIU Quan-zhong(School of Information Engineering,Northwest A&F University,Yangling 712100,China;Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service,Yangling 712100,China)
出处
《计算机技术与发展》
2021年第7期140-146,共7页
Computer Technology and Development
基金
国家自然科学基金面上项目(61972322)
教育部人文社科交叉项目(18YJCZH190)
基本科研业务费前沿与交叉科学研究项目(2452019180)
中央高校基本科研业务费(2452017342)
博士科研启动经费(2452017019)。
关键词
多肽
深度学习
预测模型
识别方法
特征提取
polypeptides
deep learning
prediction model
recognition method
feature extraction