摘要
自然界生物中,大约有超过三分之二的真核蛋白质含有多个结构域.虽然AlphaFold2已经实现了端到端的蛋白质结构预测的重大突破,对于单域的静态蛋白质结构的预测精度几乎达到了实验测定的水平,但是对于多域蛋白质结构的预测精度仍然有待提高.本文提出了一种名为MDDpre的多域蛋白质残基距离预测方法.首先从多序列比对和模板中提取了序列谱、位置熵、互信息、去除背景噪声的互信息、平均接触势能、MSA特征矩阵、行注意力矩阵以及模板域间距离特征,然后设计了一个集合了三角形更新、轴向注意力机制和卷积残差块的网络来预测多域蛋白质的域间和域内距离.在62个多域蛋白质的实验结果表明,MDDpre优于现有的方法,能够有效提升多域蛋白质的残基距离预测精度.
In natural organisms,more than two-thirds of eukaryotic proteins contain multiple domains.AlphaFold2 has achieved a major breakthrough in end-to-end protein structure prediction,and the prediction accuracy of single-domain protein static structure has almost reached the level of experimental determination,however,the prediction accuracy of multi-domain protein structure still needs to be improved.In this paper,we propose a method for intra-and inter-domain distance prediction called MDDpre.Firstly,sequence spectrum,position entropy,mutual information,average contact potential energy,MSA feature matrix,row attention matrix and template inter-domain distance are extracted from multiple sequence alignments and templates;then,a network integrating triangle update,axial attention mechanism and convolutional residual block is designed to predict intra-domain and inter-domain distances of multi-domain proteins.The experimental results on 62 multi-domain proteins show that MDDpreoutperforms existing methods and can effectively improve the prediction accuracy of intra-and inter-domain distances for multi-domain proteins.
作者
李章维
张福金
赵凯龙
张贵军
LI Zhangwei;ZHANG Fujin;ZHAO Kailong;ZHANG Guijun(College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第8期1793-1799,共7页
Journal of Chinese Computer Systems
基金
国家重点研发项目(2019YFE0126100)资助
国家自然科学基金项目(62173304)资助.
关键词
多域蛋白质
域间距离预测
注意力机制
深度学习
multi-domain protein
inter-domain distance prediction
attention mechanism
deep learning