摘要
机器学习的目标是设计可以根据先验知识和观测数据不断改进其性能的算法.该算法可以帮助机器从大量的数据中提取知识,从而提升其在特定任务上的性能.作为数据驱动的方法,机器学习可以有效利用高通量实验技术产生的大批量生物数据,实现合成生物体的功能预测与智能化设计,改变合成生物学的研究范式.本文首先介绍机器学习在合成生物学领域广泛应用的几个模型及方法,如支持向量机、神经网络、生成式对抗网络、深度强化学习等.然后介绍机器学习方法在合成生物学领域的典型应用,如启动子预测、酶催化设计、代谢途径构建、基因线路设计等.本文综述面向合成生物学的机器学习方法及应用,并试图启发读者如何选择和设计机器学习方法用于合成生物学的研究.
Traditional synthetic biology takes a trial-and-error approach,suffering from inefficiency and local optima.Recent advances in high-throughput experimental techniques generate a huge amount of biological data,which enables the use of machine learning to close the“design-build-test-learn”loop.Machine learning,especially deep learning,is a data-driven modeling method,which extracts useful patterns from big data and then leverages learned knowledge to tackle specific tasks.In this review,we aim to provide a brief primer of machine learning to synthetic biologists.Starting with common taxonomy,we introduce representative methods,pipelines,and underlying principles of machine learning that can be applied in synthetic biology.We include typical methods such as support vector machine,deep neural networks,generative adversarial nets,transfer learning and reinforcement learning.In particular,discriminative models,including convolutional neural networks and support vector machine,are appropriate for predicting sequence-function relationship.Generative models,including generative adversarial nets(GANs)and deep generative models for graph generation,are suitable for sequence or network design.Next,we review the recent applications of machine learning in studying synthetic biology parts and modules,including promoters,bioactive peptides,enzymes,metabolic pathways,and genetic circuits.For example,DeePromoter combined a convolutional neural network and a long-short term memory to achieve an accuracy as high as 90%when predicting promoter sequences.For enzyme design,a Gauss Process model was proposed with Bayesian optimization by upper confidence bound method,which resulted in the engineering of thermostable P450 enzymes.For antimicrobial peptides,a generative GAN model enhanced with a feedback mechanism was trained to design peptide sequences with new functions.Finally,we conclude with future challenges and directions.Particularly,interpretable machine learning models are desirable to guide mechanistic investigation.Moreover,it is necessary to develop new machine learning methods that are more compatible with biological data,which are heterogeneous,multi-modal(such as sequence,network,image,and structure),and lack of proper labels.With the increasing availability of big biological data and development of machine learning methods tailored for synthetic biology,we envision a paradigm shift towards a closed cycle of“design-build-testlearn”in creating artificial life with predictable functions.
作者
胡如云
张嵩亚
蒙海林
余函
张建志
罗小舟
司同
刘陈立
乔宇
Ruyun Hu;Songya Zhang;Hailin Meng;Han Yu;Jianzhi Zhang;Xiaozhou Luo;Tong Si;Chenli Liu;Yu Qiao(Institute of Advanced Computing and Digital Engineering,Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China;Institute of Synthetic Biology,Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China;Shenzhen Institute of Synthetic Biology,Shenzhen 518055,China;CAS Key Laboratory of Quantitative Engineering Biology,Shenzhen 518055,China;Center for Biological Engineering,Guangzhou Institute of Advanced Technology,Chinese Academy of Sciences,Guangzhou 511458,China)
出处
《科学通报》
EI
CAS
CSCD
北大核心
2021年第3期284-299,共16页
Chinese Science Bulletin
基金
深圳市科技创新委员会项目(KQTD2015033117210153)资助。
关键词
机器学习
合成生物学
生物元件设计
生物网络设计
machine learning
synthetic biology
synthetic biology parts design
bio-networks design