面向合成生物学的机器学习方法及应用被引量：9

Machine learning for synthetic biology:Methods and applications

导出

摘要机器学习的目标是设计可以根据先验知识和观测数据不断改进其性能的算法.该算法可以帮助机器从大量的数据中提取知识,从而提升其在特定任务上的性能.作为数据驱动的方法,机器学习可以有效利用高通量实验技术产生的大批量生物数据,实现合成生物体的功能预测与智能化设计,改变合成生物学的研究范式.本文首先介绍机器学习在合成生物学领域广泛应用的几个模型及方法,如支持向量机、神经网络、生成式对抗网络、深度强化学习等.然后介绍机器学习方法在合成生物学领域的典型应用,如启动子预测、酶催化设计、代谢途径构建、基因线路设计等.本文综述面向合成生物学的机器学习方法及应用,并试图启发读者如何选择和设计机器学习方法用于合成生物学的研究. Traditional synthetic biology takes a trial-and-error approach,suffering from inefficiency and local optima.Recent advances in high-throughput experimental techniques generate a huge amount of biological data,which enables the use of machine learning to close the“design-build-test-learn”loop.Machine learning,especially deep learning,is a data-driven modeling method,which extracts useful patterns from big data and then leverages learned knowledge to tackle specific tasks.In this review,we aim to provide a brief primer of machine learning to synthetic biologists.Starting with common taxonomy,we introduce representative methods,pipelines,and underlying principles of machine learning that can be applied in synthetic biology.We include typical methods such as support vector machine,deep neural networks,generative adversarial nets,transfer learning and reinforcement learning.In particular,discriminative models,including convolutional neural networks and support vector machine,are appropriate for predicting sequence-function relationship.Generative models,including generative adversarial nets(GANs)and deep generative models for graph generation,are suitable for sequence or network design.Next,we review the recent applications of machine learning in studying synthetic biology parts and modules,including promoters,bioactive peptides,enzymes,metabolic pathways,and genetic circuits.For example,DeePromoter combined a convolutional neural network and a long-short term memory to achieve an accuracy as high as 90%when predicting promoter sequences.For enzyme design,a Gauss Process model was proposed with Bayesian optimization by upper confidence bound method,which resulted in the engineering of thermostable P450 enzymes.For antimicrobial peptides,a generative GAN model enhanced with a feedback mechanism was trained to design peptide sequences with new functions.Finally,we conclude with future challenges and directions.Particularly,interpretable machine learning models are desirable to guide mechanistic investigation.Moreover,it is necessary to develop new machine learning methods that are more compatible with biological data,which are heterogeneous,multi-modal(such as sequence,network,image,and structure),and lack of proper labels.With the increasing availability of big biological data and development of machine learning methods tailored for synthetic biology,we envision a paradigm shift towards a closed cycle of“design-build-testlearn”in creating artificial life with predictable functions.

作者胡如云张嵩亚蒙海林余函张建志罗小舟司同刘陈立乔宇 Ruyun Hu;Songya Zhang;Hailin Meng;Han Yu;Jianzhi Zhang;Xiaozhou Luo;Tong Si;Chenli Liu;Yu Qiao(Institute of Advanced Computing and Digital Engineering,Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China;Institute of Synthetic Biology,Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China;Shenzhen Institute of Synthetic Biology,Shenzhen 518055,China;CAS Key Laboratory of Quantitative Engineering Biology,Shenzhen 518055,China;Center for Biological Engineering,Guangzhou Institute of Advanced Technology,Chinese Academy of Sciences,Guangzhou 511458,China)

机构地区中国科学院深圳先进技术研究院先进计算与数字工程研究所中国科学院深圳先进技术研究院合成生物学研究所深圳合成生物学创新研究院中国科学院定量工程生物学重点实验室广州中国科学院先进技术研究所

出处《科学通报》 EI CAS CSCD 北大核心 2021年第3期284-299,共16页 Chinese Science Bulletin

基金深圳市科技创新委员会项目(KQTD2015033117210153)资助。

关键词机器学习合成生物学生物元件设计生物网络设计 machine learning synthetic biology synthetic biology parts design bio-networks design

分类号 Q819 [生物学—生物工程]

引文网络
相关文献

参考文献1

1Cis-acting regulatory elements： from random screening to quantitative design[J].Frontiers of Electrical and Electronic Engineering in China,2015,10(3):107-114. 被引量：6

二级参考文献39

1De Mey, M., Maertens, J., Lequeux, G. J., Soetaert, W. K. and Vandamme, E. J. (2007) Construction and model-based analysis of a promoter library for E. coli: an indispensable tool for metabolic engineering. BMC Biotechnol., 7, 34.
2Meng, H., Wang, J., Xiong, Z., Xu, F., Zhao, G. and Wang, Y (2013) Quantitative design of regulatory elements based on high-precision strength prediction using artificial neural network. PLoS One, 8, e60288.
3Wang, J., Meng, H., Xiong, Z. and Wang, Y (2013) Design and construction of artificial biological systems for complex natural products biosynthesis. Chinese J. Biotech. (in Chinese), 29, 1146-1160.
4Rhodius, V. A. and Mutalik, V. K. (2010) Predicting strength and function for promoters of the Escherichia coli alternative sigma factor, aE. Proc. Natl. Acad. Sci. USA, 107,2854-2859.
5Salis, H. M., Mirsky, E. A. and Voigt, C A. (2009) Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol., 27, 946-950.
6Canton, B., Labno, A. and Endy, D. (2008) Refinement and standardization of synthetic biological parts and devices. Nat. Biotechnol., 26, 787-793.
7Yuan, Y, Liu, 8., Xie, P., Zhang, M. Q., Li, Y, Xie, Z. and Wang, X. (2015) Model-guided quantitative analysis of microRNA-mediated regulation on competing endogenous RNAs using a synthetic gene circuit. Proc. Natl. Acad. Sci. USA, 112,3158-3163.
8Qi, L., Haurwitz, R. E., Shao, w., Doudna, J. A. and Arkin, A. P. (2012) RNA processing enables predictable programming of gene expression. Nat. Biotechnol., 30, 1002-1006.
9Alper, H., Fischer, C, Nevoigt, E. and Stephanopoulos, G. (2005) Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA, 102, 12678-12683.
10Wong, W. w., Tsai, T Y. and Liao, J. C. (2007) Single-cell zeroth-order protein degradation enhances the robustness of synthetic oscillator. Mol. Syst. Biol., 3, 130.

共引文献5

1陈岩,张卫军.高防腐铝合金电缆桥架生产工艺[J].轻合金加工技术,2000,28(3):29-31. 被引量：1
2Hailin Meng,Yingfei Ma,Guoqin Mai,Yong Wang,Chenli Liu.Construction of precise support vector machine based models for predicting promoter strength[J].Frontiers of Electrical and Electronic Engineering in China,2017,5(1):90-98. 被引量：2
3Guanyu Wang.Global quantitative biology can illuminate ontological connections between diseases[J].Frontiers of Electrical and Electronic Engineering in China,2017,5(2):191-198. 被引量：1
4Jianhua Li,Hailin Meng,Yong Wang.Synbiological systems for complex natural products biosynthesis[J].Synthetic and Systems Biotechnology,2016,1(4):221-229.
5Valerio Pompili,Stefano Piazza,Mingai Li,Claudio Varotto,Mickael Malnoy.Transcriptional regulation of MdmiR285N microRNA in apple(Malus x domestica)and the heterologous plant system Arabidopsis thaliana[J].Horticulture Research,2020,7(1):1501-1514.

同被引文献41

1徐德阳,王莉莉,杜春梅.微生物共培养技术的研究进展[J].微生物学报,2015,55(9):1089-1096. 被引量：26
2晁然,原永波,赵惠民.构建合成生物学制造厂[J].中国科学：生命科学,2015,45(10):976-984. 被引量：7
3朱静轩,王婷婷,周珂欣,丁立建,吴小凯,何山.表观遗传试剂诱导海洋真菌Aspergillus versicolor DJ013产生次级代谢产物的研究[J].中国海洋药物,2017,36(1):14-18. 被引量：4
4易华伟,唐晓峰.基于氨基酸序列和模拟结构预测蛋白质稳定性的研究进展[J].生物技术通报,2017,33(4):83-89. 被引量：6
5吕亚维,王睿劼,张雨靖,高文英,杨泽滳,王英娟.重组贻贝黏蛋白Mgfp-5的表达及功能评价[J].基因组学与应用生物学,2017,36(10):4108-4115. 被引量：3
6赵国屏.合成生物学:开启生命科学“会聚”研究新时代[J].中国科学院院刊,2018,33(11):1135-1149. 被引量：75
7程淑萍,谭建军,门婧睿.基于机器学习方法的非编码RNA-蛋白质相互作用的预测[J].北京生物医学工程,2019,38(4):353-359. 被引量：4
8曲戈,朱彤,蒋迎迎,吴边,孙周通.蛋白质工程：从定向进化到计算设计[J].生物工程学报,2019,35(10):1843-1856. 被引量：38
9刘晓,王跃,毛开云,范月蕾,陶诚,陈大明.生物技术与信息技术的融合发展[J].中国科学院院刊,2020,35(1):34-42. 被引量：8
10丁明珠,李炳志,王颖,谢泽雄,刘夺,元英进.合成生物学重要研究方向进展[J].合成生物学,2020,1(1):7-28. 被引量：31

引证文献9

1刘毅克,邹静.复杂型材模具制造工艺研究[J].轻合金加工技术,2000,28(3):20-22.
2杨志勇,杨泰藩,肖益平.摩托车铝-硅合金车轮的生产方法[J].轻合金加工技术,2000,28(3):32-33. 被引量：2
3刘陈立,汤超,汤雷翰,欧阳颀.定量至简,工程至繁:定量工程生物学[J].科学通报,2021,66(3):261-263. 被引量：2
4唐婷,付立豪,郭二鹏,张振坤,王子宁,马辰飞,张智彧,张建志,黄建东,司同.自动化合成生物技术与工程化设施平台[J].科学通报,2021,66(3):300-309. 被引量：14
5曾丹,储建林,陈燕茹,范代娣.人造蛋白功能材料的生物合成及应用[J].合成生物学,2021,2(4):528-542. 被引量：2
6张亭,冷梦甜,金帆,袁海.合成生物研究重大科技基础设施概述[J].合成生物学,2022,3(1):184-194. 被引量：11
7郏丽丽,孙婷婷.紫色球杆菌视紫红质光谱特性的机器学习研究[J].浙江大学学报（理学版）,2022,49(3):280-286.
8卞佳豪,杨广宇.人工智能辅助的蛋白质工程[J].合成生物学,2022,3(3):429-444. 被引量：7
9方岫琴,王文璟,李华东,张晓婷,朱天骄,车茜,李德海,张国建.微生物次级代谢产物多样性发掘方法[J].中国抗生素杂志,2024,49(4):415-426. 被引量：1

二级引证文献33

1史硕博,王禹博,乔玮博,吴龙昊,刘子鹤,谭天伟.第三代生物炼制的挑战与机遇[J].科学通报,2023,68(19):2489-2503. 被引量：1
2廖乃镘,张先菊,李伟.Al-Ti-C添加剂对亚共晶铝硅合金组织和性能的影响[J].铸造技术,2005,26(3):196-198. 被引量：2
3刘毅克,邹静.复杂型材模具制造工艺研究[J].轻合金加工技术,2000,28(3):20-22.
4郭二鹏,张建志,司同.羊毛硫肽的高通量工程改造方法新进展[J].中国生物工程杂志,2021,41(1):30-41. 被引量：2
5刘陈立,汤超,汤雷翰,欧阳颀.定量至简,工程至繁:定量工程生物学[J].科学通报,2021,66(3):261-263. 被引量：2
6刘童,李小松,蔡安辉,阳清.B和Ti细化变质Al-7Si合金磨削加工性能研究[J].工具技术,2021,55(6):39-42.
7赵晓宇,张浩,李雪飞,胡政.进化视角下的定量生物学规律与人工生命合成[J].合成生物学,2022,3(1):6-21.
8张亭,冷梦甜,金帆,袁海.合成生物研究重大科技基础设施概述[J].合成生物学,2022,3(1):184-194. 被引量：11
9安柏霖,王艳怡,钟超.合成生物技术在新材料发展中的应用[J].生命科学,2021,33(12):1551-1559. 被引量：2
10马文军,程琴娟.中文科技期刊专刊出版活跃度、类型及选题策划分析——基于344种自然科学类中文核心期刊的调查[J].科技与出版,2022(5):75-81. 被引量：14

1中国首部流域法为什么是长江[J].科学大观园,2021(6):56-59.
2任翔.计算机软件数据接口的应用[J].数码设计,2021,10(5):14-14.
3芮彬.LLC谐振式DC/DC变换器的研究[J].中国新技术新产品,2021(2):9-11. 被引量：4
4王晶,武昌.智能决策支持系统框架研究[J].信息记录材料,2021,22(1):183-184. 被引量：1
5赵广阔.电力输配电线路中的节能降耗技术的探讨[J].中国宽带,2021(3):56-56.
6赵薇,陆芳,张瑾,郝金凤,哈斯阿古拉.甜瓜ACS基因家族成员的鉴定及其表达特性分析[J].分子植物育种,2021,19(4):1097-1106. 被引量：4
7金魁,吴颉.高温超导体组合薄膜和相图表征高通量方法[J].物理学报,2021,70(1):54-70. 被引量：3
8徐斐,孟文波,唐咸弟,肖谭,姜志晨,高永海.深水气田水下采油树控制系统选型与设计[J].科技创新导报,2020,17(36):39-44.
9Jun Zhang,Xia Sheng,Zhenyao Ding,Haili Wang,Lai Feng,Xiqi Zhang,Liping Wen,Lei Jiang,Xinjian Feng.Decoupling hydrogen production from water oxidation by integrating a triphase interfacial bioelectrochemical cascade reaction[J].Science Bulletin,2021,66(2):164-169.

科学通报

2021年第3期

浏览历史

内容加载中请稍等...

面向合成生物学的机器学习方法及应用被引量：9

参考文献1

二级参考文献39

共引文献5

同被引文献41

引证文献9

二级引证文献33

相关作者

相关机构

相关主题

浏览历史

面向合成生物学的机器学习方法及应用 被引量：9

参考文献1

二级参考文献39

共引文献5

同被引文献41

引证文献9

二级引证文献33

相关作者

相关机构

相关主题

浏览历史

面向合成生物学的机器学习方法及应用被引量：9