MicroRNAs (miRNAs) are small endogenous non-coding RNAs of about 22 nt in length that take crucial roles in many biological pro cesses. These short RNAs regulate the expression of mRNAs by binding to their 3'-UTRs ...MicroRNAs (miRNAs) are small endogenous non-coding RNAs of about 22 nt in length that take crucial roles in many biological pro cesses. These short RNAs regulate the expression of mRNAs by binding to their 3'-UTRs or by translational repression. Many of the current studies focus on how mature miRNAs regulate mRNAs, however, very limited knowledge is available regarding their transcrip- tional loci. It is known that primary miRNAs (pri-miRs) are first transcribed from the DNA, followed by the formation of precursor miRNAs (pre-miRs) by endonuclease activity, which finally produces the mature miRNAs. Till date, many of the pre-miRs and mature miRNAs have been experimentally verified. But unfortunately, identification of the loci of pri-miRs, promoters and associated transcrip- tion start sites (TSSs) are still in progress. TSSs of only about 40% of the known mature miRNAs in human have been reported. This information, albeit limited, may be useful for further study of the regulation of miRNAs. In this paper, we provide a novel database of validated miRNA TSSs, miRT, by collecting data from several experimental studies that validate miRNA TSSs and are available for full download. We present miRT as a web server and it is also possible to convert the TSS loci between different genome built, miRT might be a valuable resource for advanced research on miRNA regulation, which is freely accessible at: http://www.isical.ac.in/~bioinfo_miu/ miRT/miRT.php.展开更多
The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this s...The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this study that the variability is significant in the block of 5 nucleotides (nt) surrounding the TSS region compared with the block of 15 nt. This suggests that the actual region that may be involved is in the range of 5-10 nt in size. For Escherichia coli, we note that the information content from dinucleotide substitution matrices clearly shows a better discrimination, suggesting the presence of some correlations. However, for human this effect is much less, and for mouse it is practically absent. We can conclude that the presence of short-range correlations within the TSS region is species-dependent and is not universal. We further observe that there are other variable regions in the mitochondrial control element apart from TSS. It is also noted that effective comparisons can only be made on blocks, while single nucleotide comparisons do not give us any detectable signals.展开更多
With the accomplishment of the genome draft sequences, identification of functional elements in genome has become an urgent task. Full-length cDNAs provide an important resource for gene identification and their preci...With the accomplishment of the genome draft sequences, identification of functional elements in genome has become an urgent task. Full-length cDNAs provide an important resource for gene identification and their precise structural feature determination. It also provides a basis for genomic element definition. As many regulatory elements are around transcription start sites(TSSs), precise localization of TSSs in the genome becomes a critical step for identifying the associated core promoters. Massive parallel snapshot of TSSs at a particular time under a specific experimental condition makes it possible to globally analyze important regulatory elements around TSSs and further construct transcriptional regulatory networks. In this paper, we first reviewed two important full-length cDNA cloning techniques: cap-trapper technique and oligo-capping technique. Then,we introduced deepCAGE, a cap-trapper and deep sequencing-based TSS profiling technique, and its applications in the research of transcriptional regulation.展开更多
In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution...In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glaeombio.net/NPEST.展开更多
The identification of functional motifs in a DNA sequence is fundamentally a statistical pattern recognition problem. This paper introduces a new algorithm for the recognition of functional transcription start sites ...The identification of functional motifs in a DNA sequence is fundamentally a statistical pattern recognition problem. This paper introduces a new algorithm for the recognition of functional transcription start sites (TSSs) in human genome sequences, in which a RBF neural network is adopted, and an improved heuristic method for a 5-tuple feature viable construction, is proposed and implemented in two RBFPromoter and ImpRBFPromoter packages developed in Visual C++ 6.0. The algorithm is evaluated on several different test sequence sets. Compared with several other promoter recognition programs, this algorithm is proved to be more flexible, with stronger learning ability and higher accuracy.展开更多
Dunaliella salina, a halotolerant unicellular green alga without a rigid cell wall, can live in salinities ranging from 0.05 to 5 mol/L NaC1. These features of D. salina make it an ideal host for the production of ant...Dunaliella salina, a halotolerant unicellular green alga without a rigid cell wall, can live in salinities ranging from 0.05 to 5 mol/L NaC1. These features of D. salina make it an ideal host for the production of antibodies, oral vaccine, and commercially valuable polypeptides. To produce high level of heterologous proteins from D. salina, highly efficient promoters are required to drive expression of target genes under controlled condition. In the present study, we cloned a 5' franking region of 1.4 kb from the carbonic anhydrase (CAH) gene ofD. salina by genomic walking and PCR. The fragment was ligated to the pMD18-T vector and characterized. Sequence analysis indicated that this region contained conserved motifs, including a TATA- like box and CAAT-box. Tandem (GT)n repeats that had a potential role of transcriptional control, were also found in this region. The transcription start site (TSS) of the CAH gene was determined by 5' RACE and nested PCR method. Transformation assays showed that the 1.4 kb fragment was able to drive expression of the selectable bar (bialaphos resistance) gene when the fusion was transformed into D. salina by biolistics. Northern blotting hybridizations showed that the bar transcript was most abundant in cells grown in 2 mol/L NaCl, and less abundant in 0.5 mol/L NaCl, indicating that expression of the bar gene was induced at high salinity. These results suggest the potential use of the CAH gene promoter to induce the expression of heterologous genes in D. salina under varied salt condition.展开更多
Glutamate transporter EAACl removes excitatory neurotransmitter in central nervous system, and also absorbs glutamate in epithelia of intestine, kidney, liver and heart for normal cell growth. When a mouse cDNA was sc...Glutamate transporter EAACl removes excitatory neurotransmitter in central nervous system, and also absorbs glutamate in epithelia of intestine, kidney, liver and heart for normal cell growth. When a mouse cDNA was screened using EAACl cDNA fragment as probe in our lab, a transcript (GenBank U75214) encoding an EAACl protein with 148 residues truncated at N-terminal was cloned and named as EAAC2. Sequence analysis shows that EAAC2 has it's own start code and unique 5'UTR that is different from that of EAACl. A mouse genomic library was screened and a positive clone including EAACl CDS was sequenced (GenBank AF 322393) and indicates that normal EAACl transcript (GenBank U73521) is transcribed from 10 exons in terms of exon Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ, Ⅵ, Ⅶ, Ⅷ,Ⅸ, Ⅹ, and EAAC2 transcript is consisted by exons from IV to IX as same as that of EAACl and with its unique exon β upstream to exon Ⅳ and exon δ downstream to IX. EAAC2 transcript has a cluster of transcriptional start sites not overlapping with the transcriptional start sites of EAACl. These results indicate that EAAC2 is transcribed from an independent promoter but not an alternative splicing event.展开更多
The accurate annotation of transcription start sites(TSSs)and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts.To fulfill this,specific high-throughput exp...The accurate annotation of transcription start sites(TSSs)and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts.To fulfill this,specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner,and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences.Most of these computational tools cast the problem as a binary classification task on a balanced dataset,thus resulting in drastic false positive predictions when applied on the genome scale.Here,we present Dee Re CT-TSS,a deep learningbased method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data.We show that by effectively incorporating these two sources of information,Dee Re CT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types.Furthermore,we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types,which enables the identification of cell type-specific TSSs.Finally,we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states.The source code for Dee Re CT-TSS is available at https://github.-com/Joshua Chou2018/Dee Re CT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316.展开更多
基金the financial support from the Swarnajayanti Fellowship scheme of the Department of Science and Technology, Government of India (Grant No. DST/SJF/ET-02/2006-07)
文摘MicroRNAs (miRNAs) are small endogenous non-coding RNAs of about 22 nt in length that take crucial roles in many biological pro cesses. These short RNAs regulate the expression of mRNAs by binding to their 3'-UTRs or by translational repression. Many of the current studies focus on how mature miRNAs regulate mRNAs, however, very limited knowledge is available regarding their transcrip- tional loci. It is known that primary miRNAs (pri-miRs) are first transcribed from the DNA, followed by the formation of precursor miRNAs (pre-miRs) by endonuclease activity, which finally produces the mature miRNAs. Till date, many of the pre-miRs and mature miRNAs have been experimentally verified. But unfortunately, identification of the loci of pri-miRs, promoters and associated transcrip- tion start sites (TSSs) are still in progress. TSSs of only about 40% of the known mature miRNAs in human have been reported. This information, albeit limited, may be useful for further study of the regulation of miRNAs. In this paper, we provide a novel database of validated miRNA TSSs, miRT, by collecting data from several experimental studies that validate miRNA TSSs and are available for full download. We present miRT as a web server and it is also possible to convert the TSS loci between different genome built, miRT might be a valuable resource for advanced research on miRNA regulation, which is freely accessible at: http://www.isical.ac.in/~bioinfo_miu/ miRT/miRT.php.
文摘The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this study that the variability is significant in the block of 5 nucleotides (nt) surrounding the TSS region compared with the block of 15 nt. This suggests that the actual region that may be involved is in the range of 5-10 nt in size. For Escherichia coli, we note that the information content from dinucleotide substitution matrices clearly shows a better discrimination, suggesting the presence of some correlations. However, for human this effect is much less, and for mouse it is practically absent. We can conclude that the presence of short-range correlations within the TSS region is species-dependent and is not universal. We further observe that there are other variable regions in the mitochondrial control element apart from TSS. It is also noted that effective comparisons can only be made on blocks, while single nucleotide comparisons do not give us any detectable signals.
基金the National Natural Science Foundation of China(Nos.1137420,91129000,21273148,91229108,31370750 and 21303104)the National Basic Research Program(973) of China(No.2010CB529205)
文摘With the accomplishment of the genome draft sequences, identification of functional elements in genome has become an urgent task. Full-length cDNAs provide an important resource for gene identification and their precise structural feature determination. It also provides a basis for genomic element definition. As many regulatory elements are around transcription start sites(TSSs), precise localization of TSSs in the genome becomes a critical step for identifying the associated core promoters. Massive parallel snapshot of TSSs at a particular time under a specific experimental condition makes it possible to globally analyze important regulatory elements around TSSs and further construct transcriptional regulatory networks. In this paper, we first reviewed two important full-length cDNA cloning techniques: cap-trapper technique and oligo-capping technique. Then,we introduced deepCAGE, a cap-trapper and deep sequencing-based TSS profiling technique, and its applications in the research of transcriptional regulation.
文摘In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glaeombio.net/NPEST.
基金This work was supported by the National Natural Science Foundation of China (No.60374069)
文摘The identification of functional motifs in a DNA sequence is fundamentally a statistical pattern recognition problem. This paper introduces a new algorithm for the recognition of functional transcription start sites (TSSs) in human genome sequences, in which a RBF neural network is adopted, and an improved heuristic method for a 5-tuple feature viable construction, is proposed and implemented in two RBFPromoter and ImpRBFPromoter packages developed in Visual C++ 6.0. The algorithm is evaluated on several different test sequence sets. Compared with several other promoter recognition programs, this algorithm is proved to be more flexible, with stronger learning ability and higher accuracy.
基金Supported by National High-Tech Research and Development Pro-gram of China (863 Program, No. 2002AA628050) and National Natural Science Foundation of China (No. 30270031).
文摘Dunaliella salina, a halotolerant unicellular green alga without a rigid cell wall, can live in salinities ranging from 0.05 to 5 mol/L NaC1. These features of D. salina make it an ideal host for the production of antibodies, oral vaccine, and commercially valuable polypeptides. To produce high level of heterologous proteins from D. salina, highly efficient promoters are required to drive expression of target genes under controlled condition. In the present study, we cloned a 5' franking region of 1.4 kb from the carbonic anhydrase (CAH) gene ofD. salina by genomic walking and PCR. The fragment was ligated to the pMD18-T vector and characterized. Sequence analysis indicated that this region contained conserved motifs, including a TATA- like box and CAAT-box. Tandem (GT)n repeats that had a potential role of transcriptional control, were also found in this region. The transcription start site (TSS) of the CAH gene was determined by 5' RACE and nested PCR method. Transformation assays showed that the 1.4 kb fragment was able to drive expression of the selectable bar (bialaphos resistance) gene when the fusion was transformed into D. salina by biolistics. Northern blotting hybridizations showed that the bar transcript was most abundant in cells grown in 2 mol/L NaCl, and less abundant in 0.5 mol/L NaCl, indicating that expression of the bar gene was induced at high salinity. These results suggest the potential use of the CAH gene promoter to induce the expression of heterologous genes in D. salina under varied salt condition.
基金This research was supported by foundations fromChinese Academy of Sciences and Special Funds forMajor State Basic Research of China (G19990539).
文摘Glutamate transporter EAACl removes excitatory neurotransmitter in central nervous system, and also absorbs glutamate in epithelia of intestine, kidney, liver and heart for normal cell growth. When a mouse cDNA was screened using EAACl cDNA fragment as probe in our lab, a transcript (GenBank U75214) encoding an EAACl protein with 148 residues truncated at N-terminal was cloned and named as EAAC2. Sequence analysis shows that EAAC2 has it's own start code and unique 5'UTR that is different from that of EAACl. A mouse genomic library was screened and a positive clone including EAACl CDS was sequenced (GenBank AF 322393) and indicates that normal EAACl transcript (GenBank U73521) is transcribed from 10 exons in terms of exon Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ, Ⅵ, Ⅶ, Ⅷ,Ⅸ, Ⅹ, and EAAC2 transcript is consisted by exons from IV to IX as same as that of EAACl and with its unique exon β upstream to exon Ⅳ and exon δ downstream to IX. EAAC2 transcript has a cluster of transcriptional start sites not overlapping with the transcriptional start sites of EAACl. These results indicate that EAAC2 is transcribed from an independent promoter but not an alternative splicing event.
基金supported in part by grants from Office of Research Administration(ORA)at King Abdullah University of Science and Technology(KAUST)(Grant Nos.BAS/1/1624-01-01,FCC/1/197604-01,URF/1/4098-01-01,REI/1/0018-01-01,REI/1/4216-0101,REI/1/4437-01-01,REI/1/4473-01-01,URF/1/4352-01-01,REI/1/4742-01-01,and URF/1/4663-01-01)supported in part by the National Natural Science Foundation of China(Grant No.31970601)+1 种基金the Shenzhen Science and Technology Program(Grant No.KQTD20180411143432337)the Shenzhen Key Laboratory of Gene Regulation and Systems Biology(Grant No.ZDSYS20200811144002008),China。
文摘The accurate annotation of transcription start sites(TSSs)and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts.To fulfill this,specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner,and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences.Most of these computational tools cast the problem as a binary classification task on a balanced dataset,thus resulting in drastic false positive predictions when applied on the genome scale.Here,we present Dee Re CT-TSS,a deep learningbased method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data.We show that by effectively incorporating these two sources of information,Dee Re CT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types.Furthermore,we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types,which enables the identification of cell type-specific TSSs.Finally,we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states.The source code for Dee Re CT-TSS is available at https://github.-com/Joshua Chou2018/Dee Re CT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316.