The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this s...The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this study that the variability is significant in the block of 5 nucleotides (nt) surrounding the TSS region compared with the block of 15 nt. This suggests that the actual region that may be involved is in the range of 5-10 nt in size. For Escherichia coli, we note that the information content from dinucleotide substitution matrices clearly shows a better discrimination, suggesting the presence of some correlations. However, for human this effect is much less, and for mouse it is practically absent. We can conclude that the presence of short-range correlations within the TSS region is species-dependent and is not universal. We further observe that there are other variable regions in the mitochondrial control element apart from TSS. It is also noted that effective comparisons can only be made on blocks, while single nucleotide comparisons do not give us any detectable signals.展开更多
In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution...In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glaeombio.net/NPEST.展开更多
MicroRNAs (miRNAs) are small endogenous non-coding RNAs of about 22 nt in length that take crucial roles in many biological pro cesses. These short RNAs regulate the expression of mRNAs by binding to their 3'-UTRs ...MicroRNAs (miRNAs) are small endogenous non-coding RNAs of about 22 nt in length that take crucial roles in many biological pro cesses. These short RNAs regulate the expression of mRNAs by binding to their 3'-UTRs or by translational repression. Many of the current studies focus on how mature miRNAs regulate mRNAs, however, very limited knowledge is available regarding their transcrip- tional loci. It is known that primary miRNAs (pri-miRs) are first transcribed from the DNA, followed by the formation of precursor miRNAs (pre-miRs) by endonuclease activity, which finally produces the mature miRNAs. Till date, many of the pre-miRs and mature miRNAs have been experimentally verified. But unfortunately, identification of the loci of pri-miRs, promoters and associated transcrip- tion start sites (TSSs) are still in progress. TSSs of only about 40% of the known mature miRNAs in human have been reported. This information, albeit limited, may be useful for further study of the regulation of miRNAs. In this paper, we provide a novel database of validated miRNA TSSs, miRT, by collecting data from several experimental studies that validate miRNA TSSs and are available for full download. We present miRT as a web server and it is also possible to convert the TSS loci between different genome built, miRT might be a valuable resource for advanced research on miRNA regulation, which is freely accessible at: http://www.isical.ac.in/~bioinfo_miu/ miRT/miRT.php.展开更多
The identification of functional motifs in a DNA sequence is fundamentally a statistical pattern recognition problem. This paper introduces a new algorithm for the recognition of functional transcription start sites ...The identification of functional motifs in a DNA sequence is fundamentally a statistical pattern recognition problem. This paper introduces a new algorithm for the recognition of functional transcription start sites (TSSs) in human genome sequences, in which a RBF neural network is adopted, and an improved heuristic method for a 5-tuple feature viable construction, is proposed and implemented in two RBFPromoter and ImpRBFPromoter packages developed in Visual C++ 6.0. The algorithm is evaluated on several different test sequence sets. Compared with several other promoter recognition programs, this algorithm is proved to be more flexible, with stronger learning ability and higher accuracy.展开更多
With the accomplishment of the genome draft sequences, identification of functional elements in genome has become an urgent task. Full-length cDNAs provide an important resource for gene identification and their preci...With the accomplishment of the genome draft sequences, identification of functional elements in genome has become an urgent task. Full-length cDNAs provide an important resource for gene identification and their precise structural feature determination. It also provides a basis for genomic element definition. As many regulatory elements are around transcription start sites(TSSs), precise localization of TSSs in the genome becomes a critical step for identifying the associated core promoters. Massive parallel snapshot of TSSs at a particular time under a specific experimental condition makes it possible to globally analyze important regulatory elements around TSSs and further construct transcriptional regulatory networks. In this paper, we first reviewed two important full-length cDNA cloning techniques: cap-trapper technique and oligo-capping technique. Then,we introduced deepCAGE, a cap-trapper and deep sequencing-based TSS profiling technique, and its applications in the research of transcriptional regulation.展开更多
文摘The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this study that the variability is significant in the block of 5 nucleotides (nt) surrounding the TSS region compared with the block of 15 nt. This suggests that the actual region that may be involved is in the range of 5-10 nt in size. For Escherichia coli, we note that the information content from dinucleotide substitution matrices clearly shows a better discrimination, suggesting the presence of some correlations. However, for human this effect is much less, and for mouse it is practically absent. We can conclude that the presence of short-range correlations within the TSS region is species-dependent and is not universal. We further observe that there are other variable regions in the mitochondrial control element apart from TSS. It is also noted that effective comparisons can only be made on blocks, while single nucleotide comparisons do not give us any detectable signals.
文摘In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glaeombio.net/NPEST.
基金the financial support from the Swarnajayanti Fellowship scheme of the Department of Science and Technology, Government of India (Grant No. DST/SJF/ET-02/2006-07)
文摘MicroRNAs (miRNAs) are small endogenous non-coding RNAs of about 22 nt in length that take crucial roles in many biological pro cesses. These short RNAs regulate the expression of mRNAs by binding to their 3'-UTRs or by translational repression. Many of the current studies focus on how mature miRNAs regulate mRNAs, however, very limited knowledge is available regarding their transcrip- tional loci. It is known that primary miRNAs (pri-miRs) are first transcribed from the DNA, followed by the formation of precursor miRNAs (pre-miRs) by endonuclease activity, which finally produces the mature miRNAs. Till date, many of the pre-miRs and mature miRNAs have been experimentally verified. But unfortunately, identification of the loci of pri-miRs, promoters and associated transcrip- tion start sites (TSSs) are still in progress. TSSs of only about 40% of the known mature miRNAs in human have been reported. This information, albeit limited, may be useful for further study of the regulation of miRNAs. In this paper, we provide a novel database of validated miRNA TSSs, miRT, by collecting data from several experimental studies that validate miRNA TSSs and are available for full download. We present miRT as a web server and it is also possible to convert the TSS loci between different genome built, miRT might be a valuable resource for advanced research on miRNA regulation, which is freely accessible at: http://www.isical.ac.in/~bioinfo_miu/ miRT/miRT.php.
基金This work was supported by the National Natural Science Foundation of China (No.60374069)
文摘The identification of functional motifs in a DNA sequence is fundamentally a statistical pattern recognition problem. This paper introduces a new algorithm for the recognition of functional transcription start sites (TSSs) in human genome sequences, in which a RBF neural network is adopted, and an improved heuristic method for a 5-tuple feature viable construction, is proposed and implemented in two RBFPromoter and ImpRBFPromoter packages developed in Visual C++ 6.0. The algorithm is evaluated on several different test sequence sets. Compared with several other promoter recognition programs, this algorithm is proved to be more flexible, with stronger learning ability and higher accuracy.
基金the National Natural Science Foundation of China(Nos.1137420,91129000,21273148,91229108,31370750 and 21303104)the National Basic Research Program(973) of China(No.2010CB529205)
文摘With the accomplishment of the genome draft sequences, identification of functional elements in genome has become an urgent task. Full-length cDNAs provide an important resource for gene identification and their precise structural feature determination. It also provides a basis for genomic element definition. As many regulatory elements are around transcription start sites(TSSs), precise localization of TSSs in the genome becomes a critical step for identifying the associated core promoters. Massive parallel snapshot of TSSs at a particular time under a specific experimental condition makes it possible to globally analyze important regulatory elements around TSSs and further construct transcriptional regulatory networks. In this paper, we first reviewed two important full-length cDNA cloning techniques: cap-trapper technique and oligo-capping technique. Then,we introduced deepCAGE, a cap-trapper and deep sequencing-based TSS profiling technique, and its applications in the research of transcriptional regulation.