期刊文献+

Protein structural classification and family identification by multifractal analysis and wavelet spectrum

Protein structural classification and family identification by multifractal analysis and wavelet spectrum
下载PDF
导出
摘要 Family identification is helpful for predicting protein functions. It has been known from the literature that longer sequences of base pairs or amino acids are required to study patterns in biological sequences. Since most protein sequences are relatively short, we randomly concatenate or link the protein sequences from the same family or superfamily together to form longer protein sequences. The 6-letter model, 12-letter model, 20-letter model, the revised Schneider and Wrede scale hydrophobicity, solvent accessibility and stochastic standard state accessibility are used to convert linked protein sequences into numerical sequences. Then multifractal analyses and wavelet analysis are performed on these numerical sequences. The parameters from these analyses can be used to construct parameter spaces where each linked protein is represented by a point. The four classes of proteins, namely the α/β, α+β and α/β classes, are then distinguished in these parameter spaces. The Fisher linear discriminant algorithm is used to assess the discriminant accuracy. Numerical results indicate that the discriminant accuracies are satisfactory in separating these classes. We find that the linked proteins from the same family or superfamily tend to group together and can be separated from other linked proteins. The methods are helpful for identifying the family of an unknown protein. Family identification is helpful for predicting protein functions. It has been known from the literature that longer sequences of base pairs or amino acids are required to study patterns in biological sequences. Since most protein sequences are relatively short, we randomly concatenate or link the protein sequences from the same family or superfamily together to form longer protein sequences. The 6-letter model, 12-letter model, 20-letter model, the revised Schneider and Wrede scale hydrophobicity, solvent accessibility and stochastic standard state accessibility are used to convert linked protein sequences into numerical sequences. Then multifractal analyses and wavelet analysis are performed on these numerical sequences. The parameters from these analyses can be used to construct parameter spaces where each linked protein is represented by a point. The four classes of proteins, namely the α/β, α+β and α/β classes, are then distinguished in these parameter spaces. The Fisher linear discriminant algorithm is used to assess the discriminant accuracy. Numerical results indicate that the discriminant accuracies are satisfactory in separating these classes. We find that the linked proteins from the same family or superfamily tend to group together and can be separated from other linked proteins. The methods are helpful for identifying the family of an unknown protein.
出处 《Chinese Physics B》 SCIE EI CAS CSCD 2011年第1期194-208,共15页 中国物理B(英文版)
基金 Project supported by the Australian Research Council(Grant No.DP0559807) a Research Capacity Building Award at QUT,Scientific Research Fund of Hunan Provincial Education Department of China(Grant No.06C826) the Chinese Program for New Century Excellent Talents in University(Grant No.NCET-08-06867) the Hunan Provincial Natural Science Foundation of China(Grant No.10JJ7001) the Program for Furong Scholars of Hunan Province of China the Aid program for Science and Technology Innovative Research Team in Higher Educational Institutions of Hunan Province of China
关键词 protein family multifractal analysis wavelet spectrum protein family, multifractal analysis, wavelet spectrum
  • 相关文献

参考文献71

  • 1Abascal F and Valencia A 2003 Proteins: Structure, Function, and Genetics 53 683.
  • 2Weisser D and Klein-Seetharaman J 2004 Proceeding of ACM Symposium on Applied Computing pp.154-161.
  • 3Baker D and Sali A 2001 Science 294 93.
  • 4Gu F, Chen H and Ni J 2008 em BMC Bioinformatics 9 (Suppl 6) S5.
  • 5Anfinsen C 1973 Science 181 223.
  • 6Baxevanis A D and Ouellette B F F (Eds) 2005 Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (3rd edition) (John Wiley & Sons, Inc.).
  • 7Wang Z and Yuan Z 2000 Proteins: Structure, Function, and Genetics 38 165.
  • 8Kurgan L and Homaeian L 2006 Pattern Recognition 39 2323.
  • 9Zhang Y 2008 Current Opinion in Structural Biology 18 342.
  • 10Cao Y, Liu S, Zhang L, Qin J, Wang J and Tang K 2006 BMC Bioinformatics 7 20.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部