One of the long-standing controversial arguments in protein folding is Levinthal's paradox. We have recently proposed a new nucleation hypothesis and shown that the nucleation residues are the most conserved sequence...One of the long-standing controversial arguments in protein folding is Levinthal's paradox. We have recently proposed a new nucleation hypothesis and shown that the nucleation residues are the most conserved sequences in protein. To avoid the complicated effect of tertiary interactions, we limit our search for structural codes to the nucleation residues. Starting with the hypotheses of secondary structure nucleation and conservation of residues important for folding, we have analysed 762 folds classified as unique by SCOP. Segments of 17 residues around the top 20% conserved amino acids are analysed, resulting in approximately 100 clusters each for the main secondary structure classes of helix, sheet and coil. Helical clusters have the longest correlation range, coils the shortest (four residues). Strong specific sequence-structure correlation is observed for coil but not for helix and sheet, suggesting a mapping relationship between the sequence and the structure for coil. We propose that the central sequences in these clusters form 'structural codes', a useful basis set for identifying nucleation sites, protein fragments stable in isolation, and secondary structural patterns in proteins (particularly turns and loops).展开更多
文摘One of the long-standing controversial arguments in protein folding is Levinthal's paradox. We have recently proposed a new nucleation hypothesis and shown that the nucleation residues are the most conserved sequences in protein. To avoid the complicated effect of tertiary interactions, we limit our search for structural codes to the nucleation residues. Starting with the hypotheses of secondary structure nucleation and conservation of residues important for folding, we have analysed 762 folds classified as unique by SCOP. Segments of 17 residues around the top 20% conserved amino acids are analysed, resulting in approximately 100 clusters each for the main secondary structure classes of helix, sheet and coil. Helical clusters have the longest correlation range, coils the shortest (four residues). Strong specific sequence-structure correlation is observed for coil but not for helix and sheet, suggesting a mapping relationship between the sequence and the structure for coil. We propose that the central sequences in these clusters form 'structural codes', a useful basis set for identifying nucleation sites, protein fragments stable in isolation, and secondary structural patterns in proteins (particularly turns and loops).