摘要
在蛋白质设计领域,人工智能技术的应用已经催生了一些大模型。蛋白质的计算设计是指利用计算机技术辅助确定蛋白质的氨基酸序列,实现预设的结构和功能的过程。基于计算的蛋白质设计可进行改造设计或从头设计。特定功能的蛋白质快速生成,对生物医学研究、药物开发和生物工程等领域的发展具有重要意义。本文首先从传统计算方法、机器学习方法和深度学习方法对蛋白质的计算设计进行了梳理概述,然后介绍大语言模型的核心架构Transformer,重点分类介绍了蛋白质大语言模型的研究应用,最后对未来的研究重点进行了展望。
In the field of protein design,the application of artificial intelligence technology has spawned some large models.The computational design of proteins refers to the process of using computer technology to assist in determining the amino acid sequence of proteins and achieving preset structures and functions.Computational protein design can be conducted through redesign or de novo design.The rapid generation of proteins with specific functions is of great significance to the development of biomedical research,drug development,and bioengineering.This article first provides an overview of computational protein design from traditional computational methods,machine learning methods,and deep learning methods.Then,it introduces the core architecture of large language models,Transformer,and focuses on introducing the research and application of protein large language models.Finally,it looks forward to the future research priorities.
作者
张锦雄
孟雪莉
陈燕
韦松键
吕丽兰
胡小春
ZHANG Jinxiong;MENG Xueli;CHEN Yan;WEI Songjian;LÜLilan;HÜXiaochun(School of Computer,Electronics&Information,Guangxi University,Nanning,530004;School of Business,Guangxi University,Nanning,530004;Guangxi Subtropical Crops Research Institute,Nanning,530001;Guangxi Key Laboratory of Big Data in Finance and Economics,Nanning,530003)
出处
《基因组学与应用生物学》
CAS
CSCD
北大核心
2024年第8期1303-1320,共18页
Genomics and Applied Biology
基金
国家自然科学基金项目(62362004)
广西重点研发计划项目(桂科AB24010031)资助。