The protein inverse folding problem,designing amino acid sequences that fold into desired protein structures,is a critical challenge in biological sciences.Despite numerous data-driven and knowledge-driven methods,the...The protein inverse folding problem,designing amino acid sequences that fold into desired protein structures,is a critical challenge in biological sciences.Despite numerous data-driven and knowledge-driven methods,there remains a need for a user-friendly toolkit that effectively integrates these approaches for in-silico protein design.In this paper,we present DIProT,an interactive protein design toolkit.DIProT leverages a non-autoregressive deep generative model to solve the inverse folding problem,combined with a protein structure prediction model.This integration allows users to incorporate prior knowledge into the design process,evaluate designs in silico,and form a virtual design loop with human feedback.Our inverse folding model demonstrates competitive performance in terms of effectiveness and efficiency on TS50 and CATH4.2 datasets,with promising sequence recovery and inference time.Case studies further illustrate how DIProT can facilitate user-guided protein design.展开更多
Computational design of proteins is a relatively new field, where scientists search the enormous sequence space for sequences that can fold into desired structure and perform desired functions. With the computational ...Computational design of proteins is a relatively new field, where scientists search the enormous sequence space for sequences that can fold into desired structure and perform desired functions. With the computational approach, proteins can be designed, for example, as regulators of biological processes, novel enzymes, or as biotherapeutics. These approaches not only provide valuable information for understanding of sequence-structure-function relations in proteins, but also hold promise for applications to protein engineering and biomedical research. In this review, we briefly introduce the rationale for computational protein design, then summarize the recent progress in this field, including de novo protein design, enzyme design, and design of protein-protein interactions. Challenges and future prospects of this field are also discussed.展开更多
Rational protein design is a powerful strategy,not only for revealing the structure and function relationship of natural metalloproteins,but also for creating artificial metalloproteins with improved properties and fu...Rational protein design is a powerful strategy,not only for revealing the structure and function relationship of natural metalloproteins,but also for creating artificial metalloproteins with improved properties and functions.Myoglobin(Mb),a small heme protein created by nature with diverse functions,has been shown to be an ideal scaffold for rational protein design.The progress reviewed herein includes fine-tuning its native functions of O2binding and transport,peroxidase activity and nitrite reductase(NIR)activity,and rational expanding its functionalities to peroxygenase,heme-copper oxidase(HCO),nitric oxide reductase(NOR),as well as hydroxylamine reductase.These studies have enhanced our understanding of how metalloproteins work in nature,and provided insights for rational design of functional metalloproteins for practical applications in the future.展开更多
Proteins perform a variety of functions in living organisms and their functions are largely determined by their shape. In this paper, we propose a novel mathematical method for designing protein-like molecules of a gi...Proteins perform a variety of functions in living organisms and their functions are largely determined by their shape. In this paper, we propose a novel mathematical method for designing protein-like molecules of a given shape. In the mathematical model, molecules are represented as loops of n-simplices (2-simplices are triangles and 3-simplices are tetrahedra). We design a new molecule of a given shape by patching together a set of smaller molecules that cover the shape. The covering set of small molecules is defined using a binary relation between sets of molecules. A new molecule is then obtained as a sum of the smaller molecules, where addition of molecules is defined using transformations acting on a set of (n + 1)-dimensional cones. Due to page limitations, only the two-dimensional case (i.e., loops of triangles) is considered. No prior knowledge of Sheaf Theory, Category Theory, or Protein Science is required. The author hopes that this paper will encourage further collaboration between Mathematics and Protein Science.展开更多
A new effective and fast minimization approach completely based on the physical theory is proposed for protein design. The sequence space is essentially searched according to the Boltzmann distribution. In this approa...A new effective and fast minimization approach completely based on the physical theory is proposed for protein design. The sequence space is essentially searched according to the Boltzmann distribution. In this approach, the relative entropy is used as a minimization object function. The method has been tested on an off-lattice model of proteins and the results are better than those obtained from other similar work. Therefore, it can be applied as a uniform frame for both folding and inverse folding of proteins.展开更多
In the present study, we have developed the method brought forward recently for protein design based on the relative entropy. The new approach can be used in more common situation other than the special limits in the ...In the present study, we have developed the method brought forward recently for protein design based on the relative entropy. The new approach can be used in more common situation other than the special limits in the anterior method. The results indicate that our generalized method has increased the prediction precision for protein sequence and will be in favor of the study for protein design.展开更多
Oleanolic acid derivatives act as newer protein tyrosine phosphatase 1B (PTP-1B) inhibitors for type 2 diabetes mellitus (T2DM). In order to understand the structural requirement of PTP-1B inhibitors, 52 oleanolic...Oleanolic acid derivatives act as newer protein tyrosine phosphatase 1B (PTP-1B) inhibitors for type 2 diabetes mellitus (T2DM). In order to understand the structural requirement of PTP-1B inhibitors, 52 oleanolic acid derivatives were divided into a training set (34 compounds) and a test set (18 compounds). The highly reliable and predictive 3D-QSAR models were constructed by CoMFA, CoMSIA and topomer CoMFA methods, respectively. The results showed that the cross validated coefficient (q2) and non-cross-validated coefficient (R2) were 0.554 and 0.999 in the CoMFA model, 0.675 and 0.971 in the CoMSIA model, and 0.628 and 0.939 in the topomer CoMFA model, which suggests that three models are robust and have good exterior predictive capabilities. Furthermore, ten novel inhibitors with much higher inhibitory potency were designed. Our design strategy was that (i) the electronegative substituents (Cl, -CH2OH, OH and -CH2Cl) were introduced into the double bond of ring C, (ii) the hydrogen bond acceptor groups (C≡N and N atom), electronegative groups (C≡N, N atom, -COOH and -COOCH3) and bulky substituents (C6H5N) were connected to the C-3 position, which would result in generating potent and selective PTP-1B inhibitors. We expect that the results in this paper have the potential to facilitate the process of design and to develop new potent PTP-1B inhibitors.展开更多
In the last few years, there have been important new insights into the structural biology of G-protein coupled receptors. It is now known that allosteric binding sites are involved in the affinity and selec- tivity of...In the last few years, there have been important new insights into the structural biology of G-protein coupled receptors. It is now known that allosteric binding sites are involved in the affinity and selec- tivity of ligands for G-protein coupled receptors, and that signaling by these receptors involves both G-protein dependent and independent pathways. The present review outlines the physiological and pharmacological implications of this perspective for the design of new drugs to treat disorders of the central nervous system. Specifically, new possibilities are explored in relation to allosteric and or- thosteric binding sites on dopamine receptors for the treatment of Parkinson's disease, and on muscarinic receptors for Alzheimer's disease. Future research can seek to identify ligands that can bind to more than one site on the same receptor, or simultaneously bind to two receptors and form a dimer. For example, the design of bivalent drugs that can reach homo/hetero-dimers of D2 dopa- mine receptor holds promise as a relevant therapeutic strategy for Parkinson's disease. Regarding the treatment of Alzheimer's disease, the design of dualsteric ligands for mono-oligomeric mus- carinic receptors could increase therapeutic effectiveness by generating potent compounds that could activate more than one signaling pathway.展开更多
This paper proposes a novel category theoretic approach to describe protein’s shape, <i>i.e.</i>, a description of their shape by a set of algebraic equations. The focus of the approach is on the relation...This paper proposes a novel category theoretic approach to describe protein’s shape, <i>i.e.</i>, a description of their shape by a set of algebraic equations. The focus of the approach is on the relations between proteins, rather than on the proteins themselves. Knowledge of category theory is not required as mathematical notions are defined concretely. In this paper, proteins are represented as closed trajectories (<i>i.e.</i>, loops) of flows of triangles. The relations between proteins are defined using the fusion and fission of loops of triangles, where allostery occurs naturally. The shape of a protein is then described with quantities that are measurable with unity elements called “unit loops”. That is, protein’s shape is described with the loops that are obtained by the fusion of unit loops. Measurable loops are called “integral”. In the approach, the unit loops play a role similar to the role “1” plays in the set Z of integers. In particular, the author considers two categories of loops, the “integral” loops and the “rational” loops. Rational loops are then defined using algebraic equations with “integral loop” coefficients. Because of the approach, our theory has some similarities to quantum mechanics, where only observable quantities are admitted in physical theory. The author believes that this paper not only provides a new perspective on protein engineering, but also promotes further collaboration between biology and other disciplines.展开更多
Some highly designable protein structures have dented on the surface of their native structures, and are not full compactly folded. According to hydrophobic-polar (HP) model the most designable structures are full c...Some highly designable protein structures have dented on the surface of their native structures, and are not full compactly folded. According to hydrophobic-polar (HP) model the most designable structures are full compactly folded. To investigate the designability of the dented structures, we introduce the hydrogen bond energy in the secondary structures by using the secondary-structure-favored HP model proposed by Ou-yang etc. The result shows that the average designability increases with the strength of the hydrogen bond. The designabilities of the structures with same dented shape increase exponentially with the number of secondary structure sites. The dented structures can have the highest designabilities for a certain value of hydrogen bond energy density.展开更多
基金This work was supported by the National Natural Science Foundation of China(Nos.62250007,62225307,61721003)a grant from the Guoqiang Institute,Tsinghua University(2021GQG1023).
文摘The protein inverse folding problem,designing amino acid sequences that fold into desired protein structures,is a critical challenge in biological sciences.Despite numerous data-driven and knowledge-driven methods,there remains a need for a user-friendly toolkit that effectively integrates these approaches for in-silico protein design.In this paper,we present DIProT,an interactive protein design toolkit.DIProT leverages a non-autoregressive deep generative model to solve the inverse folding problem,combined with a protein structure prediction model.This integration allows users to incorporate prior knowledge into the design process,evaluate designs in silico,and form a virtual design loop with human feedback.Our inverse folding model demonstrates competitive performance in terms of effectiveness and efficiency on TS50 and CATH4.2 datasets,with promising sequence recovery and inference time.Case studies further illustrate how DIProT can facilitate user-guided protein design.
基金supported by the National Basic Research Program of China(Grant No.2015CB910300)the National High Technology Research and Development Program of China(Grant No.2012AA020308)the National Natural Science Foundation of China(Grant No.11021463)
文摘Computational design of proteins is a relatively new field, where scientists search the enormous sequence space for sequences that can fold into desired structure and perform desired functions. With the computational approach, proteins can be designed, for example, as regulators of biological processes, novel enzymes, or as biotherapeutics. These approaches not only provide valuable information for understanding of sequence-structure-function relations in proteins, but also hold promise for applications to protein engineering and biomedical research. In this review, we briefly introduce the rationale for computational protein design, then summarize the recent progress in this field, including de novo protein design, enzyme design, and design of protein-protein interactions. Challenges and future prospects of this field are also discussed.
基金supported by the National Natural Science Foundation of China(21101091,31370812)the Scientific Research Foundation for the Returned Overseas Chinese Scholars,Ministry of Education of China.J.Wang is supported by the National Basic Research Program of China(2010CB912301,2009CB82 5505)+1 种基金the National Natural Science Foundation of China(90913022)Y.Lu is supported by the US National Institute of Health(GM062211)
文摘Rational protein design is a powerful strategy,not only for revealing the structure and function relationship of natural metalloproteins,but also for creating artificial metalloproteins with improved properties and functions.Myoglobin(Mb),a small heme protein created by nature with diverse functions,has been shown to be an ideal scaffold for rational protein design.The progress reviewed herein includes fine-tuning its native functions of O2binding and transport,peroxidase activity and nitrite reductase(NIR)activity,and rational expanding its functionalities to peroxygenase,heme-copper oxidase(HCO),nitric oxide reductase(NOR),as well as hydroxylamine reductase.These studies have enhanced our understanding of how metalloproteins work in nature,and provided insights for rational design of functional metalloproteins for practical applications in the future.
文摘Proteins perform a variety of functions in living organisms and their functions are largely determined by their shape. In this paper, we propose a novel mathematical method for designing protein-like molecules of a given shape. In the mathematical model, molecules are represented as loops of n-simplices (2-simplices are triangles and 3-simplices are tetrahedra). We design a new molecule of a given shape by patching together a set of smaller molecules that cover the shape. The covering set of small molecules is defined using a binary relation between sets of molecules. A new molecule is then obtained as a sum of the smaller molecules, where addition of molecules is defined using transformations acting on a set of (n + 1)-dimensional cones. Due to page limitations, only the two-dimensional case (i.e., loops of triangles) is considered. No prior knowledge of Sheaf Theory, Category Theory, or Protein Science is required. The author hopes that this paper will encourage further collaboration between Mathematics and Protein Science.
基金the National Natural Science Foundation of China (Grant Nos. 10174005 & 30170230)the Beijing Natural Science Foundation (No. 5032002)
文摘A new effective and fast minimization approach completely based on the physical theory is proposed for protein design. The sequence space is essentially searched according to the Boltzmann distribution. In this approach, the relative entropy is used as a minimization object function. The method has been tested on an off-lattice model of proteins and the results are better than those obtained from other similar work. Therefore, it can be applied as a uniform frame for both folding and inverse folding of proteins.
文摘In the present study, we have developed the method brought forward recently for protein design based on the relative entropy. The new approach can be used in more common situation other than the special limits in the anterior method. The results indicate that our generalized method has increased the prediction precision for protein sequence and will be in favor of the study for protein design.
基金Supported by the Natural Science Foundation of Guangxi Province(Nos.2013GXNSFAA019019 and 2013GXNSFAA019041)
文摘Oleanolic acid derivatives act as newer protein tyrosine phosphatase 1B (PTP-1B) inhibitors for type 2 diabetes mellitus (T2DM). In order to understand the structural requirement of PTP-1B inhibitors, 52 oleanolic acid derivatives were divided into a training set (34 compounds) and a test set (18 compounds). The highly reliable and predictive 3D-QSAR models were constructed by CoMFA, CoMSIA and topomer CoMFA methods, respectively. The results showed that the cross validated coefficient (q2) and non-cross-validated coefficient (R2) were 0.554 and 0.999 in the CoMFA model, 0.675 and 0.971 in the CoMSIA model, and 0.628 and 0.939 in the topomer CoMFA model, which suggests that three models are robust and have good exterior predictive capabilities. Furthermore, ten novel inhibitors with much higher inhibitory potency were designed. Our design strategy was that (i) the electronegative substituents (Cl, -CH2OH, OH and -CH2Cl) were introduced into the double bond of ring C, (ii) the hydrogen bond acceptor groups (C≡N and N atom), electronegative groups (C≡N, N atom, -COOH and -COOCH3) and bulky substituents (C6H5N) were connected to the C-3 position, which would result in generating potent and selective PTP-1B inhibitors. We expect that the results in this paper have the potential to facilitate the process of design and to develop new potent PTP-1B inhibitors.
基金supported by SIP-IPN,CONACYT (CB-168116)FIS/IMSS (FIS/IMSS/PROT/G11-2/1013)
文摘In the last few years, there have been important new insights into the structural biology of G-protein coupled receptors. It is now known that allosteric binding sites are involved in the affinity and selec- tivity of ligands for G-protein coupled receptors, and that signaling by these receptors involves both G-protein dependent and independent pathways. The present review outlines the physiological and pharmacological implications of this perspective for the design of new drugs to treat disorders of the central nervous system. Specifically, new possibilities are explored in relation to allosteric and or- thosteric binding sites on dopamine receptors for the treatment of Parkinson's disease, and on muscarinic receptors for Alzheimer's disease. Future research can seek to identify ligands that can bind to more than one site on the same receptor, or simultaneously bind to two receptors and form a dimer. For example, the design of bivalent drugs that can reach homo/hetero-dimers of D2 dopa- mine receptor holds promise as a relevant therapeutic strategy for Parkinson's disease. Regarding the treatment of Alzheimer's disease, the design of dualsteric ligands for mono-oligomeric mus- carinic receptors could increase therapeutic effectiveness by generating potent compounds that could activate more than one signaling pathway.
文摘This paper proposes a novel category theoretic approach to describe protein’s shape, <i>i.e.</i>, a description of their shape by a set of algebraic equations. The focus of the approach is on the relations between proteins, rather than on the proteins themselves. Knowledge of category theory is not required as mathematical notions are defined concretely. In this paper, proteins are represented as closed trajectories (<i>i.e.</i>, loops) of flows of triangles. The relations between proteins are defined using the fusion and fission of loops of triangles, where allostery occurs naturally. The shape of a protein is then described with quantities that are measurable with unity elements called “unit loops”. That is, protein’s shape is described with the loops that are obtained by the fusion of unit loops. Measurable loops are called “integral”. In the approach, the unit loops play a role similar to the role “1” plays in the set Z of integers. In particular, the author considers two categories of loops, the “integral” loops and the “rational” loops. Rational loops are then defined using algebraic equations with “integral loop” coefficients. Because of the approach, our theory has some similarities to quantum mechanics, where only observable quantities are admitted in physical theory. The author believes that this paper not only provides a new perspective on protein engineering, but also promotes further collaboration between biology and other disciplines.
基金Supported by the Foundation for the Author of National Excellent Doctoral Dissertation of China (200525)the Science and Tech-nology Program of Wuhan City (20067003111-07)
文摘Some highly designable protein structures have dented on the surface of their native structures, and are not full compactly folded. According to hydrophobic-polar (HP) model the most designable structures are full compactly folded. To investigate the designability of the dented structures, we introduce the hydrogen bond energy in the secondary structures by using the secondary-structure-favored HP model proposed by Ou-yang etc. The result shows that the average designability increases with the strength of the hydrogen bond. The designabilities of the structures with same dented shape increase exponentially with the number of secondary structure sites. The dented structures can have the highest designabilities for a certain value of hydrogen bond energy density.