期刊文献+

Decoding the Structural Keywords in Protein Structure Universe

原文传递
导出
摘要 Although the protein sequence-structure gap continues to enlarge due to the development of high-throughput sequencing tools,the protein structure universe tends to be complete without proteins with novel structural folds deposited in the protein data bank (PDB)recently.In this work,we identify a protein structural dictionary (Frag-K)composed of a set of backbone fragments ranging from 4 to 20 residues as the structural "keywords"that can effectively distinguish between major protein folds.We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large scale of high-quality,non-homologous protein structures available in PDB.We analyze the impacts of clustering cut-offs on the performance of the fragment hbraries.Then,the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins).Our results show that a structural dictionary with N400 4-to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2019年第1期3-15,共13页 计算机科学技术学报(英文版)
基金 the National Natural Science Foundation of China under Grant Nos.61728211 and 61832019.
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部