使用中间层受监督的自编码器探索蛋白质的构象空间

Exploring proten’s conformational space by using encoding layer supervised auto-encoder

下载PDF

导出

摘要蛋白质的功能往往与其结构和动态变化密切相关.分子动力学模拟是研究蛋白质结构变化的有效方法,然而使用分子动力学模拟对蛋白质的构象空间进行采样需要花费很长的时间.近年来的一些研究表明,使用简单的机器学习模型——自编码器及其改进型,可以在有限采样的情况下,快速完成对蛋白质构象空间的探索.该模型通过训练神经网络,完成对隐变量的提取,同时根据其产生构象,但是由于提取出的隐变量没有直观的含义,探索构象空间的方向会受到影响.本工作通过引入反应坐标(如质心距离等),建立了一个中间层受监督的自编码器模型,以解决上述问题.该模型应用于噬菌体T4溶菌酶和腺苷酸激酶两个蛋白质分子,结果表明,仅使用短时间分子动力学模拟作为训练数据,就可以探索到这两种蛋白分子的多种典型构象.有监督(合理的反应坐标或者实验数据等)的自编码器模型有望成为探索蛋白质构象空间的有效工具. Protein function is related to its structure and dynamic change.Molecular dynamics simulation is an important tool for studying protein dynamics by exploring its conformational space,however,conformational sampling is a nontrivial issue,because of the risk of missing key details during sampling.In recent years,deep learning methods,such as auto-encoder,can couple with MD to explore conformational space of protein.After being trained with the MD trajectories,auto-encoder can generate new conformations quickly by inputting random numbers in low dimension space.However,some problems still exist,such as requirements for the quality of the training set,the limitation of explorable area and the undefined sampling direction.In this work,we build a supervised auto-encoder,in which some reaction coordinates are used to guide conformational exploration along certain directions.We also try to expand the explorable area by training through the data generated by the model.Two multi-domain proteins,bacteriophage T4 lysozyme and adenylate kinase,are used to illustrate the method.In the case of the training set consisting of only under-sampled simulated trajectories,the supervised auto-encoder can still explore along the given reaction coordinates.The explored conformational space can cover all the experimental structures of the proteins and be extended to regions far from the training sets.Having been verified by molecular dynamics and secondary structure calculations,most of the conformations explored are found to be plausible.The supervised auto-encoder provides a way to efficiently expand the conformational space of a protein with limited computational resources,although some suitable reaction coordinates are required.By integrating appropriate reaction coordinates or experimental data,the supervised auto-encoder may serve as an efficient tool for exploring conformational space of proteins.

作者陈光临张志勇 Chen Guang-Lin;Zhang Zhi-Yong(Department of Physics,University of Science and Technology of China,Hefei 230026,China)

机构地区中国科学技术大学物理系

出处《物理学报》 SCIE EI CAS CSCD 北大核心 2023年第24期91-99,共9页 Acta Physica Sinica

基金国家重点研发计划(批准号:2021YFA1301504) 国家自然科学基金(批准号:91953101) 中国科学院战略性先导科技专项(B类)(批准号:XDB37040202)资助的课题。

关键词蛋白质构象空间分子动力学模拟机器学习自编码器 protein conformational space molecular dynamics simulation machine learning auto-encoder

分类号 Q51 [生物学—生物化学]

引文网络
相关文献

1管星悦,黄恒焱,彭华祺,刘彦航,李文飞,王炜.生物分子模拟中的机器学习方法[J].物理学报,2023,72(24):45-57.
2陈文哲,王霜,翟玉玲,李舟航.颗粒团聚状态对纳米流体热导率的影响[J].化工进展,2023,42(11):5700-5706.
3罗启睿,沈一凡,罗孟波.高分子塌缩相变和临界吸附相变的计算机模拟和机器学习[J].物理学报,2023,72(24):71-79.
4刘栋,崔新月,王浩东,张贵军.蛋白质结构模型质量评估方法综述[J].物理学报,2023,72(24):14-29.
5杨建宇,席昆,竺立哲.生物大分子过渡态搜索算法及其中的机器学习[J].物理学报,2023,72(24):2-13.

物理学报

2023年第24期

浏览历史

内容加载中请稍等...

使用中间层受监督的自编码器探索蛋白质的构象空间

相关作者

相关机构

相关主题

浏览历史