期刊文献+

Modeling the specificity of protein-DNA nteractions 被引量:4

Modeling the specificity of protein-DNA nteractions
原文传递
导出
摘要 The specificity of protein-DNA interactions is most commonly modeled using position weight matrices (PWMs). First introduced in 1982, they have been adapted to many new types of data and many different approaches have been developed to determine the parameters of the PWM. New high-throughput technologies provide a large amount of data rapidly and offer an unprecedented opportunity to determine accurately the specificities of many transcription factors (TFs). But taking full advantage of the new data requires advanced algorithms that take into account the biophysical processes involved in generating the data. The new large datasets can also aid in determining when the PWM model is inadequate and must be extended to provide accurate predictions of binding sites. This article provides a general mathematical description of a PWM and how it is used to score potential binding sites, a brief history of the approaches that have been developed and the types of data that are used with an emphasis on algorithms that we have developed for analyzing high-throughput datasets from several new technologies. It also describes extensions that can be added when the simple PWM model is inadequate and further enhancements that may be necessary, it briefly describes some applications of PWMs in the discovery and modeling of in vivo regulatory networks. The specificity of protein-DNA interactions is most commonly modeled using position weight matrices (PWMs). First introduced in 1982, they have been adapted to many new types of data and many different approaches have been developed to determine the parameters of the PWM. New high-throughput technologies provide a large amount of data rapidly and offer an unprecedented opportunity to determine accurately the specificities of many transcription factors (TFs). But taking full advantage of the new data requires advanced algorithms that take into account the biophysical processes involved in generating the data. The new large datasets can also aid in determining when the PWM model is inadequate and must be extended to provide accurate predictions of binding sites. This article provides a general mathematical description of a PWM and how it is used to score potential binding sites, a brief history of the approaches that have been developed and the types of data that are used with an emphasis on algorithms that we have developed for analyzing high-throughput datasets from several new technologies. It also describes extensions that can be added when the simple PWM model is inadequate and further enhancements that may be necessary, it briefly describes some applications of PWMs in the discovery and modeling of in vivo regulatory networks.
机构地区 Department of Genetics
出处 《Frontiers of Electrical and Electronic Engineering in China》 2013年第2期115-130,共16页 中国电气与电子工程前沿(英文版)
  • 相关文献

参考文献95

  • 1Stormo, G. D., Schneider, T. D., Gold, L. and Ehrenfeucht, A. (1982) Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res., 10, 2997-3011.
  • 2Benos, E V., Lapedes, A. S. and Stormo, G. D. (2002) Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol., 323, 701-727.
  • 3Kaplan, T., Friedman, N. and Margalit, H. (2005) Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comput. Biol., 1, el.
  • 4Wolfe, S. A., Nekludova, L. and Pabo, C. O. (2000) DNA recognition by Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct., 29, 183-212.
  • 5Klug, A. (2010) The discovery of zinc fingers and their development for practical applications in gene regulation and genome manipulation. Q. Rev. Biophys., 43, 1-21.
  • 6Foat, B. C. and Stormo, G. D. (2009) Discovering structural cis- regulatory elements by modeling the behaviors of rnRNAs. Mol. Syst. Biol., 5,268.
  • 7Gorodkin, J., Heyer, L. J. and Stormo, G. D. (1997) Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res., 25, 3724-3732.
  • 8Eddy, S. R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755-763.
  • 9Rosenblatt, F. (1962) Principles of Neurodynamics. New York: Spartan Books.
  • 10Stormo, G. D., Schneider, T. D. and Gold, L. M. (1982) Characteriza- tion of translational initiation sites in E. coli. Nucleic Acids Res., 10, 2971-2996.

同被引文献20

引证文献4

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部