Knowledge-based scoring functions have been widely used for protein structure prediction, protein-small molecule, and protein-nucleic acid interactions, in which one critical step is to find an appropriate representat...Knowledge-based scoring functions have been widely used for protein structure prediction, protein-small molecule, and protein-nucleic acid interactions, in which one critical step is to find an appropriate representation of protein structures. A key issue is to determine the minimal protein representations, which is important not only for developing of scoring func- tions but also for understanding the physics of protein folding. Despite significant progresses in simplifying residues into alphabets, few studies have been done to address the optimal number of atom types for proteins. Here, we have investigated the atom typing issue by classifying the 167 heavy atoms of proteins through 11 schemes with 1 to 20 atom types based on their physicochemical and functional environments. For each atom typing scheme, a statistical mechanics-based iterative method was used to extract atomic distance-dependent potentials from protein structures. The atomic distance-dependent pair potentials for different schemes were illustrated by several typical atom pairs with different physicochemical proper- ties. The derived potentials were also evaluated on a high-resolution test set of 148 diverse proteins for native structure recognition. It was found that there was a crossover around the scheme of four atom types in terms of the success rate as a function of the number of atom types, which means that four atom types may be used when investigating the basic folding mechanism of proteins. However, it was revealed by a close examination of typical potentials that 14 atom types were needed to describe the protein interactions at atomic level. The present study will be beneficial for the development of protein related scoring functions and the understanding of folding mechanisms.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant No.31670724)the National Key Research and Development Program of China(Grant Nos.2016YFC1305800 and 2016YFC1305805)the Startup Grant of Huazhong University of Science and Technology,China
文摘Knowledge-based scoring functions have been widely used for protein structure prediction, protein-small molecule, and protein-nucleic acid interactions, in which one critical step is to find an appropriate representation of protein structures. A key issue is to determine the minimal protein representations, which is important not only for developing of scoring func- tions but also for understanding the physics of protein folding. Despite significant progresses in simplifying residues into alphabets, few studies have been done to address the optimal number of atom types for proteins. Here, we have investigated the atom typing issue by classifying the 167 heavy atoms of proteins through 11 schemes with 1 to 20 atom types based on their physicochemical and functional environments. For each atom typing scheme, a statistical mechanics-based iterative method was used to extract atomic distance-dependent potentials from protein structures. The atomic distance-dependent pair potentials for different schemes were illustrated by several typical atom pairs with different physicochemical proper- ties. The derived potentials were also evaluated on a high-resolution test set of 148 diverse proteins for native structure recognition. It was found that there was a crossover around the scheme of four atom types in terms of the success rate as a function of the number of atom types, which means that four atom types may be used when investigating the basic folding mechanism of proteins. However, it was revealed by a close examination of typical potentials that 14 atom types were needed to describe the protein interactions at atomic level. The present study will be beneficial for the development of protein related scoring functions and the understanding of folding mechanisms.