Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins.Protease-controlled proteolysis plays a key role in the degradation and recycling...Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins.Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins,which is essential for various physiological processes.Thus,solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases,as well as for therapeutic target identification and pharmaceutical applicability.Consequently,there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information.In this study,we present Procleave,a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information.Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method,which turned out to be critical for the performance of Procleave.The optimal approximations of all structural parameter values were encoded in a conditional random field(CRF)computational framework,alongside sequence and chemical group-based features.Here,we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests.Procleave is capable of correctly identifying most cleavage sites in the case study.Importantly,when applied to the human structural proteome encompassing 17,628 protein structures,Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases.Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.展开更多
基金financially supported by grants from the Australian Research Council(ARC)(Grant Nos.LP110200333 and DP120104460)National Health and Medical Research Council of Australia(NHMRC)(Grant Nos.APP1127948,APP1144652,and APP490989)+2 种基金the National Institute of Allergy and Infectious Diseases of the National Institutes of Health,USA(Grant No.R01 AI111965)a Major Inter-Disciplinary Research(IDR)Grant Awarded by Monash University,Australia(Grant Nos.2019-32 and 2018-28)supported in part by Informatics start-up packages through the School of Medicine,University of Alabama at Birmingham,USA
文摘Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins.Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins,which is essential for various physiological processes.Thus,solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases,as well as for therapeutic target identification and pharmaceutical applicability.Consequently,there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information.In this study,we present Procleave,a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information.Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method,which turned out to be critical for the performance of Procleave.The optimal approximations of all structural parameter values were encoded in a conditional random field(CRF)computational framework,alongside sequence and chemical group-based features.Here,we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests.Procleave is capable of correctly identifying most cleavage sites in the case study.Importantly,when applied to the human structural proteome encompassing 17,628 protein structures,Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases.Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.