Biomanufacturing,which uses renewable resources as raw materials and uses biological processes to produce energy and chemicals,has long been regarded as a production model that replaces the unsustainable fossil econom...Biomanufacturing,which uses renewable resources as raw materials and uses biological processes to produce energy and chemicals,has long been regarded as a production model that replaces the unsustainable fossil economy.The construction of non-natural and efficient biosynthesis routes of chemicals is an important goal of green biomanufacturing.Traditional methods that rely on experience are difficult to support the realization of this goal.However,with the rapid development of information technology,the intelligence of biomanufacturing has brought hope to achieve this goal.Retrobiosynthesis and computational enzyme design,as two of the main technologies in intelligent biomanufacturing,have developed rapidly in recent years and have made great achievements and some representative works have demonstrated the great value that the integration of the two fields may bring.To achieve the final integration of the two fields,it is necessary to examine the information,methods and tools from a bird’s-eye view,and to find a feasible idea and solution for establishing a connection point.For this purpose,this article briefly reviewed the main ideas,methods and tools of the two fields,and put forward views on how to achieve the integration of the two fields.展开更多
Computational design of proteins is a relatively new field, where scientists search the enormous sequence space for sequences that can fold into desired structure and perform desired functions. With the computational ...Computational design of proteins is a relatively new field, where scientists search the enormous sequence space for sequences that can fold into desired structure and perform desired functions. With the computational approach, proteins can be designed, for example, as regulators of biological processes, novel enzymes, or as biotherapeutics. These approaches not only provide valuable information for understanding of sequence-structure-function relations in proteins, but also hold promise for applications to protein engineering and biomedical research. In this review, we briefly introduce the rationale for computational protein design, then summarize the recent progress in this field, including de novo protein design, enzyme design, and design of protein-protein interactions. Challenges and future prospects of this field are also discussed.展开更多
Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the...Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction.We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation.The influence of protein and chemical descriptors was assessed in three scenarios,which were predicting the activity of unknown relations between known enzymes and known chemicals(new relationship evaluation),predicting the activity of novel enzymes on known chemicals(new enzyme evaluation),and predicting the activity of new chemicals on known enzymes(new chemical evaluation).The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes,whereas chemical descriptors appear no effect.A variety of sequence-based and structure-based protein descriptors were constructed,among which the esm-2 descriptor achieved the best results.Using enzyme families as labels showed that descriptors could cluster proteins well,which could explain the contributions of descriptors to the machine learning model.As a counterpart,in the new chemical evaluation,chemical descriptors made significant improvement in four out of the seven datasets,while protein descriptors appear no effect.We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models.The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy.This work provides guidance for the development of machine learning models for specific enzyme families.展开更多
There are lots of biochemical reactions in the biosynthetic pathway without associated enzymes.Reactions predicted by retro-biosynthetic tools are not assigned gene sequences.Besides,non-natural reactions designed wit...There are lots of biochemical reactions in the biosynthetic pathway without associated enzymes.Reactions predicted by retro-biosynthetic tools are not assigned gene sequences.Besides,non-natural reactions designed with novel functions also lack suitable enzymes.All these reactions can be categorized as orphan reactions.The absence of protein-encoding genes in these orphan reactions limits their direct experimental implementation.Computational tools have been developed to find candidate enzymes for these orphan reactions.Herein,we discuss recent advances in these computational tools,including reaction similarity-based methods for calculating the substructural similarity between orphan reactions and known enzymatic reactions;sequence-based tools combine metabolic knowledge base and phenotypic information with genomic,transcriptomic,and metabolomic data to mine appropriate enzymes for orphan reactions;and approaches based on the creation of enzyme variants for orphan reactions as enzyme engineering modifications and de novo design of enzymes.We believe that our review will greatly facilitate the design of microbial cell factories and contribute to the development of the biomanufacturing field.展开更多
基金support from the National Nat-ural Science Foundation of China(U1663227,21861132017,21811530003,21878170).
文摘Biomanufacturing,which uses renewable resources as raw materials and uses biological processes to produce energy and chemicals,has long been regarded as a production model that replaces the unsustainable fossil economy.The construction of non-natural and efficient biosynthesis routes of chemicals is an important goal of green biomanufacturing.Traditional methods that rely on experience are difficult to support the realization of this goal.However,with the rapid development of information technology,the intelligence of biomanufacturing has brought hope to achieve this goal.Retrobiosynthesis and computational enzyme design,as two of the main technologies in intelligent biomanufacturing,have developed rapidly in recent years and have made great achievements and some representative works have demonstrated the great value that the integration of the two fields may bring.To achieve the final integration of the two fields,it is necessary to examine the information,methods and tools from a bird’s-eye view,and to find a feasible idea and solution for establishing a connection point.For this purpose,this article briefly reviewed the main ideas,methods and tools of the two fields,and put forward views on how to achieve the integration of the two fields.
基金supported by the National Basic Research Program of China(Grant No.2015CB910300)the National High Technology Research and Development Program of China(Grant No.2012AA020308)the National Natural Science Foundation of China(Grant No.11021463)
文摘Computational design of proteins is a relatively new field, where scientists search the enormous sequence space for sequences that can fold into desired structure and perform desired functions. With the computational approach, proteins can be designed, for example, as regulators of biological processes, novel enzymes, or as biotherapeutics. These approaches not only provide valuable information for understanding of sequence-structure-function relations in proteins, but also hold promise for applications to protein engineering and biomedical research. In this review, we briefly introduce the rationale for computational protein design, then summarize the recent progress in this field, including de novo protein design, enzyme design, and design of protein-protein interactions. Challenges and future prospects of this field are also discussed.
基金This work is supported by National Key Research and Development Program of China(no.2022YFC2105900).
文摘Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction.We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation.The influence of protein and chemical descriptors was assessed in three scenarios,which were predicting the activity of unknown relations between known enzymes and known chemicals(new relationship evaluation),predicting the activity of novel enzymes on known chemicals(new enzyme evaluation),and predicting the activity of new chemicals on known enzymes(new chemical evaluation).The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes,whereas chemical descriptors appear no effect.A variety of sequence-based and structure-based protein descriptors were constructed,among which the esm-2 descriptor achieved the best results.Using enzyme families as labels showed that descriptors could cluster proteins well,which could explain the contributions of descriptors to the machine learning model.As a counterpart,in the new chemical evaluation,chemical descriptors made significant improvement in four out of the seven datasets,while protein descriptors appear no effect.We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models.The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy.This work provides guidance for the development of machine learning models for specific enzyme families.
基金supported by the National Natural Science Foundation of China(22138006).
文摘There are lots of biochemical reactions in the biosynthetic pathway without associated enzymes.Reactions predicted by retro-biosynthetic tools are not assigned gene sequences.Besides,non-natural reactions designed with novel functions also lack suitable enzymes.All these reactions can be categorized as orphan reactions.The absence of protein-encoding genes in these orphan reactions limits their direct experimental implementation.Computational tools have been developed to find candidate enzymes for these orphan reactions.Herein,we discuss recent advances in these computational tools,including reaction similarity-based methods for calculating the substructural similarity between orphan reactions and known enzymatic reactions;sequence-based tools combine metabolic knowledge base and phenotypic information with genomic,transcriptomic,and metabolomic data to mine appropriate enzymes for orphan reactions;and approaches based on the creation of enzyme variants for orphan reactions as enzyme engineering modifications and de novo design of enzymes.We believe that our review will greatly facilitate the design of microbial cell factories and contribute to the development of the biomanufacturing field.