摘要
蛋白质可溶性在药物设计的研究中起着重要的作用,传统生物实验测试蛋白质可溶性费时费力,因此基于计算方法对可溶性进行预测成为一个重要的研究方向。针对传统可溶性预测模型不能充分表示蛋白质特征的问题,文中设计了一种基于多种蛋白质序列信息的神经网络模型PSPNet,并应用到蛋白质可溶性预测中。该模型首先使用氨基酸残基序列嵌入信息和氨基酸序列进化信息表示蛋白质序列;然后采用卷积神经网络提取氨基酸序列嵌入特征的局部关键信息;其次利用双向LSTM网络提取蛋白质序列远程依赖特征;最后利用注意力机制将该特征与氨基酸进化信息融合,并将包含了多种序列信息的融合特征用于蛋白质可溶性预测。实验结果表明,相比基准方法,该模型提高了蛋白质可溶性预测的精度,并具有良好的可扩展性。
Protein solubility plays an important role in the research of drug design.Traditional biological experiments of detecting protein solubility are time-consuming and laborious.Identifying protein solubility based on computational methods has become an important research hot spot in bioinformatics.Aiming at the problem of insufficient representation of protein features by traditional solubility prediction models,this paper designs a neural network model PSPNet based on protein sequence information and applies it to protein solubility prediction.PSPNet uses amino acid residue sequence embedding information and amino acid sequence evolution information to represent protein sequences.Then convolutional neural network is used to extract the local key information of amino acid sequence embedding features.Secondly,bidirectional LSTM network is used to extract the features of remote dependencies of protein sequences.Finally,the attention mechanism is used to fuse this feature and amino acid evolution information,and the fusion feature containing multiple sequence information is used in protein solubility prediction.The experimental results show that PASNet obtains the remarkable performance of protein solubility prediction compared with the benchmark methods and also has a good scalability.
作者
牛富生
郭延哺
李维华
刘文洋
NIU Fu-sheng;GUO Yan-bu;LI Wei-hua;LIU Wen-yang(School of Information Science and Engineering,Yunnan University,Kunming 650500,China)
出处
《计算机科学》
CSCD
北大核心
2022年第1期285-291,共7页
Computer Science
基金
国家自然科学基金项目(32060151)
云南省教育厅科学研究基金(2019J0006)
云南省创新团队项目(2018HC019)
云南大学研究生科研创新基金项目(2020Z73)。
关键词
蛋白质可溶性
多特征融合
深度学习
注意力机制
Protein solubility
Multi-feature fusion
Deep learning
Attention mechanism