The protein inverse folding problem,designing amino acid sequences that fold into desired protein structures,is a critical challenge in biological sciences.Despite numerous data-driven and knowledge-driven methods,the...The protein inverse folding problem,designing amino acid sequences that fold into desired protein structures,is a critical challenge in biological sciences.Despite numerous data-driven and knowledge-driven methods,there remains a need for a user-friendly toolkit that effectively integrates these approaches for in-silico protein design.In this paper,we present DIProT,an interactive protein design toolkit.DIProT leverages a non-autoregressive deep generative model to solve the inverse folding problem,combined with a protein structure prediction model.This integration allows users to incorporate prior knowledge into the design process,evaluate designs in silico,and form a virtual design loop with human feedback.Our inverse folding model demonstrates competitive performance in terms of effectiveness and efficiency on TS50 and CATH4.2 datasets,with promising sequence recovery and inference time.Case studies further illustrate how DIProT can facilitate user-guided protein design.展开更多
基金This work was supported by the National Natural Science Foundation of China(Nos.62250007,62225307,61721003)a grant from the Guoqiang Institute,Tsinghua University(2021GQG1023).
文摘The protein inverse folding problem,designing amino acid sequences that fold into desired protein structures,is a critical challenge in biological sciences.Despite numerous data-driven and knowledge-driven methods,there remains a need for a user-friendly toolkit that effectively integrates these approaches for in-silico protein design.In this paper,we present DIProT,an interactive protein design toolkit.DIProT leverages a non-autoregressive deep generative model to solve the inverse folding problem,combined with a protein structure prediction model.This integration allows users to incorporate prior knowledge into the design process,evaluate designs in silico,and form a virtual design loop with human feedback.Our inverse folding model demonstrates competitive performance in terms of effectiveness and efficiency on TS50 and CATH4.2 datasets,with promising sequence recovery and inference time.Case studies further illustrate how DIProT can facilitate user-guided protein design.