Protein-protein complexes play an important role in the physiology and the pathology of cellular functions, and therefore are attractive therapeutic targets. A small subset of residues known as “hot spots”, accounts...Protein-protein complexes play an important role in the physiology and the pathology of cellular functions, and therefore are attractive therapeutic targets. A small subset of residues known as “hot spots”, accounts for most of the protein-protein binding free energy. Computational methods play a critical role in identifying the hotspots on the proteinprotein interface. In this paper, we use a computational alanine scanning method with all-atom force fields for predicting hotspots for 313 mutations in 16 protein complexes of known structures. We studied the effect of force fields, solvation models, and conformational sampling on the hotspot predictions. We compared the calculated change in the protein-protein interaction energies upon mutation of the residues in and near the protein-protein interface, to the experimental change in free energies. The AMBER force field (FF) predicted 86% of the hotspots among the three commonly used FF for proteins, namely, AMBER FF, Charmm27 FF, and OPLS-2005 FF. However, AMBER FF also showed a high rate of false positives, while the Charmm27 FF yielded 74% correct predictions of the hotspot residues with low false positives. Van der Waals and hydrogen bonding energy show the largest energy contribution with a high rate of prediction accuracy, while the desolvation energy was found to contribute little to improve the hot spot prediction. Using a conformational ensemble including limited backbone movement instead of one static structure leads to better predicttion of hotpsots.展开更多
Revolutionary breakthroughs in artificial intelligence (AI) and machine learning (ML) have had a profound impact on a widerange of scientific disciplines, including the development of artificial cell factories for bio...Revolutionary breakthroughs in artificial intelligence (AI) and machine learning (ML) have had a profound impact on a widerange of scientific disciplines, including the development of artificial cell factories for biomanufacturing. In this paper, wereview the latest studies on the application of data-driven methods for the design of new proteins, pathways, and strains. Wefirst briefly introduce the various types of data and databases relevant to industrial biomanufacturing, which are the basis fordata-driven research. Different types of algorithms, including traditional ML and more recent deep learning methods, are alsopresented. We then demonstrate how these data-based approaches can be applied to address various issues in cell factorydevelopment using examples from recent studies, including the prediction of protein function, improvement of metabolicmodels, and estimation of missing kinetic parameters, design of non-natural biosynthesis pathways, and pathway optimization.In the last section, we discuss the current limitations of these data-driven approaches and propose that data-driven methodsshould be integrated with mechanistic models to complement each other and facilitate the development of synthetic strains forindustrial biomanufacturing.展开更多
文摘Protein-protein complexes play an important role in the physiology and the pathology of cellular functions, and therefore are attractive therapeutic targets. A small subset of residues known as “hot spots”, accounts for most of the protein-protein binding free energy. Computational methods play a critical role in identifying the hotspots on the proteinprotein interface. In this paper, we use a computational alanine scanning method with all-atom force fields for predicting hotspots for 313 mutations in 16 protein complexes of known structures. We studied the effect of force fields, solvation models, and conformational sampling on the hotspot predictions. We compared the calculated change in the protein-protein interaction energies upon mutation of the residues in and near the protein-protein interface, to the experimental change in free energies. The AMBER force field (FF) predicted 86% of the hotspots among the three commonly used FF for proteins, namely, AMBER FF, Charmm27 FF, and OPLS-2005 FF. However, AMBER FF also showed a high rate of false positives, while the Charmm27 FF yielded 74% correct predictions of the hotspot residues with low false positives. Van der Waals and hydrogen bonding energy show the largest energy contribution with a high rate of prediction accuracy, while the desolvation energy was found to contribute little to improve the hot spot prediction. Using a conformational ensemble including limited backbone movement instead of one static structure leads to better predicttion of hotpsots.
基金the National Key Research and Development Program of China(grant number 2018YFA0900300)the International Partnership Program of Chinese Academy of Sciences(grant number 153D31KYSB20170121)Youth Innovation Promotion Association CAS,and the Tianjin Synthetic Biotechnology Innovation Capacity Improvement Project(grant numbers TSBICIP-PTJS-001 and TSBICIP-CXRC-018).
文摘Revolutionary breakthroughs in artificial intelligence (AI) and machine learning (ML) have had a profound impact on a widerange of scientific disciplines, including the development of artificial cell factories for biomanufacturing. In this paper, wereview the latest studies on the application of data-driven methods for the design of new proteins, pathways, and strains. Wefirst briefly introduce the various types of data and databases relevant to industrial biomanufacturing, which are the basis fordata-driven research. Different types of algorithms, including traditional ML and more recent deep learning methods, are alsopresented. We then demonstrate how these data-based approaches can be applied to address various issues in cell factorydevelopment using examples from recent studies, including the prediction of protein function, improvement of metabolicmodels, and estimation of missing kinetic parameters, design of non-natural biosynthesis pathways, and pathway optimization.In the last section, we discuss the current limitations of these data-driven approaches and propose that data-driven methodsshould be integrated with mechanistic models to complement each other and facilitate the development of synthetic strains forindustrial biomanufacturing.