Prediction of drug-protein binding is critical for virtual drug screening.Many deep learning methods have been proposed to predict the drug-protein binding based on protein sequences and drug representation sequences....Prediction of drug-protein binding is critical for virtual drug screening.Many deep learning methods have been proposed to predict the drug-protein binding based on protein sequences and drug representation sequences.However,most existing methods extract features from protein and drug sequences separately.As a result,they can not learn the features characterizing the drug-protein interactions.In addition,the existing methods encode the protein(drug)sequence usually based on the assumption that each amino acid(atom)has the same contribution to the binding,ignoring different impacts of different amino acids(atoms)on the binding.However,the event of drug-protein binding usually occurs between conserved residue fragments in the protein sequence and atom fragments of the drug molecule.Therefore,a more comprehensive encoding strategy is required to extract information from the conserved fragments.In this paper,we propose a novel model,named FragDPI,to predict the drug-protein binding affinity.Unlike other methods,we encode the sequences based on the conserved fragments and encode the protein and drug into a unified vector.Moreover,we adopt a novel two-step training strategy to train FragDPI.The pre-training step is to learn the interactions between different fragments using unsupervised learning.The fine-tuning step is for predicting the binding affinities using supervised learning.The experiment results have illustrated the superiority of FragDPI.展开更多
基金supported by the National Key R&D Program of China(2019YFA0904303).
文摘Prediction of drug-protein binding is critical for virtual drug screening.Many deep learning methods have been proposed to predict the drug-protein binding based on protein sequences and drug representation sequences.However,most existing methods extract features from protein and drug sequences separately.As a result,they can not learn the features characterizing the drug-protein interactions.In addition,the existing methods encode the protein(drug)sequence usually based on the assumption that each amino acid(atom)has the same contribution to the binding,ignoring different impacts of different amino acids(atoms)on the binding.However,the event of drug-protein binding usually occurs between conserved residue fragments in the protein sequence and atom fragments of the drug molecule.Therefore,a more comprehensive encoding strategy is required to extract information from the conserved fragments.In this paper,we propose a novel model,named FragDPI,to predict the drug-protein binding affinity.Unlike other methods,we encode the sequences based on the conserved fragments and encode the protein and drug into a unified vector.Moreover,we adopt a novel two-step training strategy to train FragDPI.The pre-training step is to learn the interactions between different fragments using unsupervised learning.The fine-tuning step is for predicting the binding affinities using supervised learning.The experiment results have illustrated the superiority of FragDPI.