Understanding the functional effects of genetic variants is crucial in modern genomics and genetics. Transcription factor binding sites (TFBSs) are one of the most important cis-regulatory elements. While multiple t...Understanding the functional effects of genetic variants is crucial in modern genomics and genetics. Transcription factor binding sites (TFBSs) are one of the most important cis-regulatory elements. While multiple tools have been developed to assess functional effects of genetic variants at TFBSs, they usually assume that each variant works in isolation and neglect the potential "interference" among multiple variants within the same TFBS. In this study, we presented COPE-TFBS (Context-Oriented Predictor for variant Effect on Transcription Factor Binding Site), a novel method that considers sequence context to accurately predict variant effects on TFBSs. We systematically re-analyzed the sequencing data from both the 1000 Genomes Project and the Genotype-Tissue Expression (GTEx) Project via COPE-TFBS, and identified numbers of novel TFBSs, transformed TFBSs and discordantly annotated TFBSs resulting from multiple variants, further highlighting the necessity of sequence context in accurately annotating genetic variants.展开更多
Transcription Factors(TFs) are a very diverse family of DNA-binding proteins that play essential roles in the regulation of gene expression through binding to specific DNA sequences. They are considered as one of th...Transcription Factors(TFs) are a very diverse family of DNA-binding proteins that play essential roles in the regulation of gene expression through binding to specific DNA sequences. They are considered as one of the prime drug targets since mutations and aberrant TF-DNA interactions are implicated in many diseases.Identification of TF-binding sites on a genomic scale represents a critical step in delineating transcription regulatory networks and remains a major goal in genomic annotations. Recent development of experimental high-throughput technologies has provided valuable information about TF-binding sites at genome scale under various physiological and developmental conditions. Computational approaches can provide a cost-effective alternative and complement the experimental methods by using the vast quantities of available sequence or structural information. In this review we focus on structure-based prediction of transcription factor binding sites. In addition to its potential in genomescale predictions, structure-based approaches can help us better understand the TF-DNA interaction mechanisms and the evolution of transcription factors and their target binding sites. The success of structure-based methods also bears a translational impact on targeted drug design in medicine and biotechnology.展开更多
Recent advances in the development of high-throughput tools have significantly revolutionized our understanding of molecular mech- anisms underlying normal and dysfunctional biological processes. Here we present a nov...Recent advances in the development of high-throughput tools have significantly revolutionized our understanding of molecular mech- anisms underlying normal and dysfunctional biological processes. Here we present a novel computational tool, transcription factor search and analysis tool (TrFAST), which was developed for the in silico analysis of transcription factor binding sites (TFBSs) of sig- naling pathway-specific TFs. TrFAST facilitates searching as well as comparative analysis of regulatory motifs through an exact pattern matching algorithm followed by the graphical representation of matched binding sites in multiple sequences up to 50 kb in length. TrFAST is proficient in reducing the number of comparisons by the exact pattern matching strategy. In contrast to the pre-existing tools that find TFBS in a single sequence, TrFAST seeks out the desired pattern in multiple sequences simultaneously. It counts the GC con- tent within the given multiple sequence data set and assembles the combinational details of consensus sequence(s) located at these regions, thereby generating a visual display based on the abundance of unique pattern. Comparative regulatory region analysis of multi- ple orthologous sequences simultaneously enhances the features of TrFAST and provides a significant insight into study of conservation of non-coding cis-regulatory elements. TrFAST is freely available at http://www.fi-pk.com/trfast.html.展开更多
Transcription factor (TF) binding to its DNA target site plays an essential role in gene regulation. The location, orientation and spacing of transcription factor binding sites (TFBSs) also affect regulatory funct...Transcription factor (TF) binding to its DNA target site plays an essential role in gene regulation. The location, orientation and spacing of transcription factor binding sites (TFBSs) also affect regulatory function of the TF. However, how nucleosomal context of TFBSs influences TF binding and subsequent gene regulation remains to be elucidated. Using genome-wide nucleosome positioning and TF binding data in budding yeast, we found that binding affinities of TFs to DNA tend to decrease with increasing nucleosome occupancy of the associated binding sites. We further demonstrated that nucleosomal context of binding sites is correlated with gene regulation of the corresponding TF. Nucleosome-depleted TFBSs are linked to high gene activity and low expression noise, whereas nucleosome-covered TFBSs are associated with low gene activity and high expression noise. Moreover, nucleosome-covered TFBSs tend to disrupt coexpression of the corresponding TF target genes. We conclude that nucleosomal context of binding sites influences TF binding affinity, subsequently affecting the regulation of TFs on their target genes. This emphasizes the need to include nucleosomal context of TFBSs in modeling gene regulation.展开更多
Transcription factor binding sites (TFBS) play key roles in genebior 6.8 wavelet expression and regulation. They are short sequence segments with definite structure and can be recognized by the corresponding transcr...Transcription factor binding sites (TFBS) play key roles in genebior 6.8 wavelet expression and regulation. They are short sequence segments with definite structure and can be recognized by the corresponding transcription factors correctly. From the viewpoint of statistics, the candidates of TFBS should be quite different from the segments that are randomly combined together by nucleotide. This paper proposes a combined statistical model for finding over- represented short sequence segments in different kinds of data set. While the over-represented short sequence segment is described by position weight matrix, the nucleotide distribution at most sites of the segment should be far from the background nucleotide distribution. The central idea of this approach is to search for such kind of signals. This algorithm is tested on 3 data sets, including binding sites data set of cyclic AMP receptor protein in E.coli, PlantProm DB which is a non-redundant collection of proximal promoter sequences from different species, collection of the intergenic sequences of the whole genome of E.Coli. Even though the complexity of these three data sets is quite different, the results show that this model is rather general and sensible.展开更多
[Objectives]This study was conducted to investigate characteristics of the human TCF7 L2 gene promoter.[Methods]The 2000 bp sequence of the 5’regulatory region of the human TCF7 L2 gene was obtained from the UCSC gen...[Objectives]This study was conducted to investigate characteristics of the human TCF7 L2 gene promoter.[Methods]The 2000 bp sequence of the 5’regulatory region of the human TCF7 L2 gene was obtained from the UCSC genome database.The promoter,transcription factor binding sites,CpG islands,SNPs and so on were analyzed by a variety of online softwares.[Results]The bioinformatics analysis results showed there were at least 5 potential promoters in the positive-sense strand of the 2000 bp sequence,among which-242--192 bp,-853--803 bp might contain core promoters.A TATA box and a CpG island with a length of 499 bp were found.241,944 and 1035(positive-sense strand)transcription factor binding sites were predicted by the AliBaba2.1,PROMO and JASPAR softwares,respectively.207 common transcription factor binding sites in the conserved region of human and mouse homologous TCF7 L2 gene promoter were identified with CONREAL program,involving 66 kinds of transcription factors.Two SNPs were found in the promoter region.[Conclusions]The promoter of the human TCF7 L2 gene was analyzed by bioinformatics,and the promoter characteristics were obtained.展开更多
A systematic phylogenetic footprinting approach was performed to identify conserved transcription factor binding sites (TFBSs) in mammalian promoter regions using human, mouse and rat sequence alignments. We found t...A systematic phylogenetic footprinting approach was performed to identify conserved transcription factor binding sites (TFBSs) in mammalian promoter regions using human, mouse and rat sequence alignments. We found that the score distributions of most binding site models did not follow the Gaussian distribution required by many statistical methods. Therefore, we performed an empirical test to establish the optimal threshold for each model. We gauged our computational predictions by comparing with previously known TFBSs in the PCK1 gene promoter of the cytosolic isoform of phosphoenolpyruvate carboxykinase, and achieved a sensitivity of 75% and a specificity of approximately 32% Almost all known sites overlapped with predicted sites, and several new putative TFBSs were also identified. We validated a predicted SP1 binding site in the control of PCK1 transcription using gel shift and reporter assays. Finally, we applied our computational approach to the prediction of putative TFBSs within the promoter regions of all available RefSeq genes. Our full set of TFBS predictions is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.展开更多
To understand the organization of the biological networks that might potentially govern the pathogenesis of hormone refractory prostate cancer (HRPC), we investigated the transcriptional circuitry and signaling in a...To understand the organization of the biological networks that might potentially govern the pathogenesis of hormone refractory prostate cancer (HRPC), we investigated the transcriptional circuitry and signaling in androgen-dependent 22Rvl and MDA PCa 2b cells, androgen- and estrogen-dependent LNCaP cells, and androgen-independent DU 145 and PC-3 prostate cancer (PCa) cell lines. We used microarray analyses, quantitative real-time PCR, pathway prediction analyses, and determination of Transcription Factor Binding Site (TFBS) signatures to dissect HRPC regulatory networks. We generated graphical representations of global topology and local network motifs that might be important in prostate carcinogenesis. Many important putative biomarker 'target hubs' were identified in the current study including AP-1, NF-KB, EGFR, ERK1/2, JNK, p38 MAPK, TGF beta, VEGF, PDGF, CD44, Akt, PI3K, NOTCH1, CASP1, MMP2 and AR. Our results suggest that complex cellular events including autoregulation, feedback loops and cross-talk might govern progression from early lesion to clinically diagnosed PCa, as well as metastatic potential of pre-existent high-grade prostate intraepithelial neoplasia (HG-PIN) and/or advancement to HRPC. The identification of TFBS signatures for TCF/LEF, SOX9 and ELK1 in the regulatory elements suggests additional biomarkers for the potential development of chemopreventive/therapeutic strategies against PCa. Taken together, in this study, we have identified putative biomarker 'target hubs' in the architecture of PCa signaling networks, and investigated TFBS signatures that might enhance our understanding of key regulatory nodes in the progression and pathogenesis of HRPC.展开更多
基金supported by funds from the National Key R&D Program of China (2016YFC0901603)the China 863 Program (2015AA020108)+1 种基金the State Key Laboratory of Protein and Plant Gene Researchsupported in part by the National Program for Support of Top-notch Young Professionals
文摘Understanding the functional effects of genetic variants is crucial in modern genomics and genetics. Transcription factor binding sites (TFBSs) are one of the most important cis-regulatory elements. While multiple tools have been developed to assess functional effects of genetic variants at TFBSs, they usually assume that each variant works in isolation and neglect the potential "interference" among multiple variants within the same TFBS. In this study, we presented COPE-TFBS (Context-Oriented Predictor for variant Effect on Transcription Factor Binding Site), a novel method that considers sequence context to accurately predict variant effects on TFBSs. We systematically re-analyzed the sequencing data from both the 1000 Genomes Project and the Genotype-Tissue Expression (GTEx) Project via COPE-TFBS, and identified numbers of novel TFBSs, transformed TFBSs and discordantly annotated TFBSs resulting from multiple variants, further highlighting the necessity of sequence context in accurately annotating genetic variants.
基金supported by the National Science Foundation #DBI-0844749 and #DBI-1356459 to JTG
文摘Transcription Factors(TFs) are a very diverse family of DNA-binding proteins that play essential roles in the regulation of gene expression through binding to specific DNA sequences. They are considered as one of the prime drug targets since mutations and aberrant TF-DNA interactions are implicated in many diseases.Identification of TF-binding sites on a genomic scale represents a critical step in delineating transcription regulatory networks and remains a major goal in genomic annotations. Recent development of experimental high-throughput technologies has provided valuable information about TF-binding sites at genome scale under various physiological and developmental conditions. Computational approaches can provide a cost-effective alternative and complement the experimental methods by using the vast quantities of available sequence or structural information. In this review we focus on structure-based prediction of transcription factor binding sites. In addition to its potential in genomescale predictions, structure-based approaches can help us better understand the TF-DNA interaction mechanisms and the evolution of transcription factors and their target binding sites. The success of structure-based methods also bears a translational impact on targeted drug design in medicine and biotechnology.
基金supported by Higher Education Commission, Pakistan(Grant No.20-1493/R&D/09)
文摘Recent advances in the development of high-throughput tools have significantly revolutionized our understanding of molecular mech- anisms underlying normal and dysfunctional biological processes. Here we present a novel computational tool, transcription factor search and analysis tool (TrFAST), which was developed for the in silico analysis of transcription factor binding sites (TFBSs) of sig- naling pathway-specific TFs. TrFAST facilitates searching as well as comparative analysis of regulatory motifs through an exact pattern matching algorithm followed by the graphical representation of matched binding sites in multiple sequences up to 50 kb in length. TrFAST is proficient in reducing the number of comparisons by the exact pattern matching strategy. In contrast to the pre-existing tools that find TFBS in a single sequence, TrFAST seeks out the desired pattern in multiple sequences simultaneously. It counts the GC con- tent within the given multiple sequence data set and assembles the combinational details of consensus sequence(s) located at these regions, thereby generating a visual display based on the abundance of unique pattern. Comparative regulatory region analysis of multi- ple orthologous sequences simultaneously enhances the features of TrFAST and provides a significant insight into study of conservation of non-coding cis-regulatory elements. TrFAST is freely available at http://www.fi-pk.com/trfast.html.
基金supported by the Yat-Sen Innovative Talents Cultivation Program for Excellent Tutors
文摘Transcription factor (TF) binding to its DNA target site plays an essential role in gene regulation. The location, orientation and spacing of transcription factor binding sites (TFBSs) also affect regulatory function of the TF. However, how nucleosomal context of TFBSs influences TF binding and subsequent gene regulation remains to be elucidated. Using genome-wide nucleosome positioning and TF binding data in budding yeast, we found that binding affinities of TFs to DNA tend to decrease with increasing nucleosome occupancy of the associated binding sites. We further demonstrated that nucleosomal context of binding sites is correlated with gene regulation of the corresponding TF. Nucleosome-depleted TFBSs are linked to high gene activity and low expression noise, whereas nucleosome-covered TFBSs are associated with low gene activity and high expression noise. Moreover, nucleosome-covered TFBSs tend to disrupt coexpression of the corresponding TF target genes. We conclude that nucleosomal context of binding sites influences TF binding affinity, subsequently affecting the regulation of TFs on their target genes. This emphasizes the need to include nucleosomal context of TFBSs in modeling gene regulation.
基金Project supported by the National Natural Science Foundation of China (Grant No 70671089)the Key Important Project(No 10635040)
文摘Transcription factor binding sites (TFBS) play key roles in genebior 6.8 wavelet expression and regulation. They are short sequence segments with definite structure and can be recognized by the corresponding transcription factors correctly. From the viewpoint of statistics, the candidates of TFBS should be quite different from the segments that are randomly combined together by nucleotide. This paper proposes a combined statistical model for finding over- represented short sequence segments in different kinds of data set. While the over-represented short sequence segment is described by position weight matrix, the nucleotide distribution at most sites of the segment should be far from the background nucleotide distribution. The central idea of this approach is to search for such kind of signals. This algorithm is tested on 3 data sets, including binding sites data set of cyclic AMP receptor protein in E.coli, PlantProm DB which is a non-redundant collection of proximal promoter sequences from different species, collection of the intergenic sequences of the whole genome of E.Coli. Even though the complexity of these three data sets is quite different, the results show that this model is rather general and sensible.
基金the Diabetes Special Fund Project of Hubei University of Science and Technology(2016-18XZ12)。
文摘[Objectives]This study was conducted to investigate characteristics of the human TCF7 L2 gene promoter.[Methods]The 2000 bp sequence of the 5’regulatory region of the human TCF7 L2 gene was obtained from the UCSC genome database.The promoter,transcription factor binding sites,CpG islands,SNPs and so on were analyzed by a variety of online softwares.[Results]The bioinformatics analysis results showed there were at least 5 potential promoters in the positive-sense strand of the 2000 bp sequence,among which-242--192 bp,-853--803 bp might contain core promoters.A TATA box and a CpG island with a length of 499 bp were found.241,944 and 1035(positive-sense strand)transcription factor binding sites were predicted by the AliBaba2.1,PROMO and JASPAR softwares,respectively.207 common transcription factor binding sites in the conserved region of human and mouse homologous TCF7 L2 gene promoter were identified with CONREAL program,involving 66 kinds of transcription factors.Two SNPs were found in the promoter region.[Conclusions]The promoter of the human TCF7 L2 gene was analyzed by bioinformatics,and the promoter characteristics were obtained.
基金This work was supported in part by CRIS Project (No.1265-31000-090-00D and 1265-31000-081-00D) from US Department of Agricul-ture and by NIH Grant DK-25541 (to RWH)JY was supported by the NIH Metabolism Training Program (DK-07139)
文摘A systematic phylogenetic footprinting approach was performed to identify conserved transcription factor binding sites (TFBSs) in mammalian promoter regions using human, mouse and rat sequence alignments. We found that the score distributions of most binding site models did not follow the Gaussian distribution required by many statistical methods. Therefore, we performed an empirical test to establish the optimal threshold for each model. We gauged our computational predictions by comparing with previously known TFBSs in the PCK1 gene promoter of the cytosolic isoform of phosphoenolpyruvate carboxykinase, and achieved a sensitivity of 75% and a specificity of approximately 32% Almost all known sites overlapped with predicted sites, and several new putative TFBSs were also identified. We validated a predicted SP1 binding site in the control of PCK1 transcription using gel shift and reporter assays. Finally, we applied our computational approach to the prediction of putative TFBSs within the promoter regions of all available RefSeq genes. Our full set of TFBS predictions is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.
基金National Institutes of Health(Grant No.RO1 CA118947 and RO1 CA152826 to Ah-Ng Tony Kong and R21 CA133675 to Li Cai)
文摘To understand the organization of the biological networks that might potentially govern the pathogenesis of hormone refractory prostate cancer (HRPC), we investigated the transcriptional circuitry and signaling in androgen-dependent 22Rvl and MDA PCa 2b cells, androgen- and estrogen-dependent LNCaP cells, and androgen-independent DU 145 and PC-3 prostate cancer (PCa) cell lines. We used microarray analyses, quantitative real-time PCR, pathway prediction analyses, and determination of Transcription Factor Binding Site (TFBS) signatures to dissect HRPC regulatory networks. We generated graphical representations of global topology and local network motifs that might be important in prostate carcinogenesis. Many important putative biomarker 'target hubs' were identified in the current study including AP-1, NF-KB, EGFR, ERK1/2, JNK, p38 MAPK, TGF beta, VEGF, PDGF, CD44, Akt, PI3K, NOTCH1, CASP1, MMP2 and AR. Our results suggest that complex cellular events including autoregulation, feedback loops and cross-talk might govern progression from early lesion to clinically diagnosed PCa, as well as metastatic potential of pre-existent high-grade prostate intraepithelial neoplasia (HG-PIN) and/or advancement to HRPC. The identification of TFBS signatures for TCF/LEF, SOX9 and ELK1 in the regulatory elements suggests additional biomarkers for the potential development of chemopreventive/therapeutic strategies against PCa. Taken together, in this study, we have identified putative biomarker 'target hubs' in the architecture of PCa signaling networks, and investigated TFBS signatures that might enhance our understanding of key regulatory nodes in the progression and pathogenesis of HRPC.