[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm su...[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.展开更多
We have previously reported that the human ACAT1 gene produces a chimeric mRNA through the interchromosomal processing of two discontinuous RNAs transcribed from chromosomes 1 and 7. The chimeric mRNA uses AUG1397-139...We have previously reported that the human ACAT1 gene produces a chimeric mRNA through the interchromosomal processing of two discontinuous RNAs transcribed from chromosomes 1 and 7. The chimeric mRNA uses AUG1397-1399 and GGC1274-1276 as translation initiation codons to produce normal 50-kDa ACAT1 and a novel enzymatically active 56-kDa isoform, respectively, with the latter being authentically present in human cells, including human monocyte- derived macrophages. In this work, we report that RNA secondary structures located in the vicinity of the GGC1274-1276 codon are required for production of the 56-kDa isoform. The effects of the three predicted stem-loops (nt 1255-1268, 1286-1342 and 1355-1384) were tested individually by transfecting expression plasmids into cells that contained the wild-type, deleted or mutant stem-loop sequences linked to a partial ACAT1 AUG open reading frame (ORF) or to the ORFs of other genes. The expression patterns were monitored by western blot analyses. We found that the upstream stem-loop1255-1268 from chromosome 7 and downstream stem-loop1286-1342 from chromosome 1 were needed for production of the 56-kDa isoform, whereas the last stem-loop135s-1384 from chromosome 1 was dispensable. The results of experi- ments using both monocistronic and bicistronic vectors with a stable hairpin showed that translation initiation from the GGC1274-1276 codon was mediated by an internal ribosome entry site (IRES). Further experiments revealed that translation initiation from the GGC1274-1276 codon requires the upstream AU-constituted RNA secondary structure and the downstream GC-rich structure. This mechanistic work provides further support for the biological significance of the chimeric nature of the human ACAT1 transcript.展开更多
A novel method for the prediction of RNA secondary structure was proposed based on the particle swarm optimization(PSO). PSO is known to be effective in solving many different types of optimization problems and know...A novel method for the prediction of RNA secondary structure was proposed based on the particle swarm optimization(PSO). PSO is known to be effective in solving many different types of optimization problems and known for being able to approximate the global optimal results in the solution space. We designed an efficient objective function according to the minimum free energy, the number of selected stems and the average length of selected stems. We calculated how many legal stems there were in the sequence, and selected some of them to obtain an optimal result using PSO in the right of the objective function. A method based on the improved particle swarm optimization(IPSO) was proposed to predict RNA secondary structure, which consisted of three stages. The first stage was applied to encoding the source sequences, and to exploring all the legal stems. Then, a set of encoded stems were created in order to prepare input data for the second stage. In the second stage, IPSO was responsible for structure selection. At last, the optimal result was obtained from the secondary structures selected via IPSO. Nine sequences from the comparative RNA website were selected for the evaluation of the proposed method. Compared with other six methods, the proposed method decreased the complexity and enhanced the sensitivity and specificity on the basis of the experiment results.展开更多
Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of...Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.展开更多
A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudokno...A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.展开更多
Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA ...Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes.Thus,a plethora of approaches have been developed to predict,identify,or solve RNA structures based on various computational,molecular,genetic,chemical,or physicochemical strategies.Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation,time,speed,cost,and throughput,but they strongly underperform in terms of accuracy that significantly limits their broader application.Nonetheless,the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs.Here,we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity.We found that while many programs performed well in relatively simple tasks,their performance varied significantly in more complex RNA folding problems.However,in general,a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures,at least based on the specific class of sequences tested,suggesting that it may represent the future of RNA structure prediction algorithms.展开更多
RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have suc...RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have successfully generated in vivo Structure-seq libraries in rice. We found that the structural flexibility of mRNAs might associate with the dynamics of biological function. Higher N6-methyladenosine (mSA) modification tends to have less RNA structure in 3' UTR, whereas GC content does not significantly affect in vivo mRNA structure to maintain efficient biological processes such as translation. Comparative analysis of RNA structurome between rice and Arabidopsis revealed that higher GC content does not lead to stronger structure and less RNA structural flexibility. Moreover, we found a weak correlation between sequence and structure conservation of the orthologs between rice and Arabidopsis. The conservation and divergence of both sequence and in vivo RNA structure corresponds to diverse and specific biological processes. Our results indicate that RNA secondary structure might offer a separate layer of selection to the sequence between monocot and dicot. Therefore, our study implies that RNA structure evolves differently in various biological processes to maintain robustness in development and adaptational flexibility during angiosperm evolution.展开更多
Background:RNA structure is the crucial basis for RNA function in various cellular processes.Over the last decade,high throughput structure profiling(SP)experiments have brought enormous insight into RNA secondary str...Background:RNA structure is the crucial basis for RNA function in various cellular processes.Over the last decade,high throughput structure profiling(SP)experiments have brought enormous insight into RNA secondary structure.Results:In this review,we first provide an overview of approaches for RNA secondary structure prediction,including free energy-based algorithms and comparative sequence analysis.Then we introduce SP technologies,databases to document SP data,and pipelines/algorithms to normalize and interpret SP data.Computational frameworks that incorporate SP data in RNA secondary structure prediction are also presented.Conclusions:We finally discuss potential directions for improvement in the prediction and differential analysis of RNA secondary structure.展开更多
The secondary structure is a fundamental feature of both non-coding RNAs(ncRNAs)and messenger RNAs(mRNAs).However,our understanding of the secondary structures of mRNAs,especially those of the coding regions,remains e...The secondary structure is a fundamental feature of both non-coding RNAs(ncRNAs)and messenger RNAs(mRNAs).However,our understanding of the secondary structures of mRNAs,especially those of the coding regions,remains elusive,likely due to translation and the lack of RNA-binding proteins that sustain the consensus structure like those binding to ncRNAs.Indeed,mRNAs have recently been found to adopt diverse alternative structures,but the overall functional significance remains untested.We hereby approach this problem by estimating the folding specificity,i.e.,the probability that a fragment of an mRNA folds back to the same partner once refolded.We show that the folding specificity of mRNAs is lower than that of ncRNAs and exhibits moderate evolutionary conservation.Notably,we find that specific rather than alternative folding is likely evolutionarily adaptive since specific folding is frequently associated with functionally important genes or sites within a gene.Additional analysis in combination with ribosome density suggests the ability to modulate ribosome movement as one potential functional advantage provided by specific folding.Our findings reveal a novel facet of the RNA structurome with important functional and evolutionary implications and indicate a potential method for distinguishing the mRNA secondary structures maintained by natural selection from molecular noise.展开更多
Background:RNA secondary structures play a pivotal role in posttranscriptional regulation and the functions of non-coding RNAs,yet in vivo RNA secondary structures remain enigmatic.PARIS(Psoralen Analysis of RNA Inter...Background:RNA secondary structures play a pivotal role in posttranscriptional regulation and the functions of non-coding RNAs,yet in vivo RNA secondary structures remain enigmatic.PARIS(Psoralen Analysis of RNA Interactions and Structures)is a recently developed high-throughput sequencing-based approach that enables direct capture of RNA duplex structures in vivo.However,the existence of incompatible,fuzzy pairing information obstructs the integration of PARIS data with the existing tools for reconstructing RNA secondary structure models at the single-base resolution.Methods:We introduce IRIS,a method for predicting RNA secondary structure ensembles based on PARIS data.IRIS generates a large set of candidate RNA secondary structure models under the guidance of redistributed PARIS reads and then uses a Bayesian model to identify the optimal ensemble,according to both thermodynamic principles and PARIS data.Results:The predicted RNA structure ensembles by IRIS have been verified based on evolutionary conservation information and consistency with other experimental RNA structural data.HIS is implemented in Python and freely available at http://iris.zhanglab.net.Conclusion:IRIS capitalizes upon PARIS data to improve the prediction of in vivo RNA secondary structure ensembles.We expect that IRIS will enhance the application of the PARIS technology and shed more insight on in vivo RNA secondary structures.展开更多
Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. How...Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, M, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, λ and λ', to perform phylogenic analysis of homologous RNA sequences. Here, λ' is for non-structural regions of the sequence and λ' is for structural regions of the sequence; 3) we merge λ and λ' into M to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The sensitivity and specificity of our method are more accurate than those of the predictions by Pfold.展开更多
We propose a novel model to predict RNA secondary structure based on the fuzzy sets theory. Through the fuzzy partition of state spaces and the incorporation of fuzzy goals, we can find the optimal fuzzy policy of the...We propose a novel model to predict RNA secondary structure based on the fuzzy sets theory. Through the fuzzy partition of state spaces and the incorporation of fuzzy goals, we can find the optimal fuzzy policy of the model using fuzzy dynamic programming algorithm effectively, and then determine optimal and suboptimal RNA secondary structures. Compared to the existing sophisticated prediction models, such as Zuker's method and the SCFG model, our fuzzy model based approach has many advantages: 1) computational complexity can be reduced by the fuzzy partition; 2) the optimal secondary structure and several suboptimal ones can be generated simultaneously; and 3) subjective prior knowledge can readily be incorporated. This paper presents a complete description of our fuzzy model and gives the implementation of the proposed method. We also apply the BJK fuzzy model structure to secondary structure predictions based on datasets of tRNA and tmRNA sequences. By the comparison of our fuzzy method with both the minimum free energy based mfold tool and the BJK grammar model of SCFG, our experimental results validate the effectiveness of the proposed method and the prediction accuracy is shown to be further improved.展开更多
In the application of RNAi technology,it is an essential step to design siR NA applicable to target gene.At present,there are many researches and conclusions on siR NA design.This paper aims to the influences of mR NA...In the application of RNAi technology,it is an essential step to design siR NA applicable to target gene.At present,there are many researches and conclusions on siR NA design.This paper aims to the influences of mR NA secondary structure or siR NA antisense-strand secondary structure on siR NA silence efficiency.The paper also discusses the problems and sets out further insights in the research.展开更多
The attenuated vaccine strains of CSFV have a 12-nucleotides (nt) insertion in the 3'-UTR of genome as compared to that of CSFV virulent strains. In this study, we found a distinct heterogeneity in the 3'-UTR of a...The attenuated vaccine strains of CSFV have a 12-nucleotides (nt) insertion in the 3'-UTR of genome as compared to that of CSFV virulent strains. In this study, we found a distinct heterogeneity in the 3'-UTR of attenuated Thiverval and HCLV strains. The longest 3'-UTR of Thiverval strain was 259 base pairs (bp) with a 32-nt insertion, the shortest 3'-UTR had only 233 bp with a 6-nt insertion. The longest 3'-UTR of HCLV strain was 244 bp with a 17-nt insertion and the shortest 3' UTR was 235 bp with a 8-nt insertion. Compared with the published sequences of 3'-UTR of vaccine and virulent strains, the 3'-UTR of CSFV vaccine strains have two variable regions where insertion among the different vaccine strains were frequently found. The first is located between the second conservative TALk codon and the start of T-rich region where we found the variable length insertion in the same vaccine strain Thiveral or HCLV and the second is located between the end of T-rich region and the front of GAA eodon, however, a 4-nt deletion was found in this region in the virulent Shimen strain. These two regions may represent the "hot spot" for mutation. Modeling the secondary structures of the 3'-UTR suggests that the T-rich insertion could result in the change of structure and free energy, thus affecting the stability of the 3'-UTR structure. These findings will help to understand the mechanism of attenuated vaccines and improve vaccine safety, stability, and efficacy.展开更多
The potato rot nematode(Ditylenchus destructor) is a very economically important nematode in agronomic and horticultural plants worldwide. In this study, 43 populations of D. destructor were collected from different h...The potato rot nematode(Ditylenchus destructor) is a very economically important nematode in agronomic and horticultural plants worldwide. In this study, 43 populations of D. destructor were collected from different hosts across China, including 37 populations from Chinese herbal medicine plants. Obtained sequences of ITS-rDNA and D2–D3of 28S-rDNA genes of D. destructor were compared and analyzed. Nine types of significant length variations in ITS sequences were observed among all populations. The differences in ITS1 length were mainly caused by the presence of repetitive elements with substantial base substitutions. Reconstructions of ITS1 secondary structures showed that the minisatellites formed a stem structure. Ten haplotypes were observed in all populations based on mutations and variations of helix H9. Among them, 3 known haplotypes(A–C) were found in 7 populations isolated from potato,sweet potato, and Codonopsis pilosula, and 7 unique haplotypes were found in other 36 populations collected from C. pilosula and Angelica sinensis compared with 7 haplotypes(A–G) according to Subbotin' system. These unique haplotypes were different from haplotypes A–G, and we named them as haplotypes H–N. The present results showed that a total of 14 haplotypes(A–N) of ITS-rDNA have been found in D. destructor. Phylogenetic analyses of ITSrDNA and D2–D3 showed that all populations of D. destructor were clustered into two major clades: one clade only containing haplotype A from sweet potato and the other containing haplotypes B–N from other plants. For further verification, PCR-ITS-RFLP profiles were conducted on 7 new haplotypes. Collectively, our study suggests that D. destructor populations on Chinese medicinal materials are very different from those on other hosts and this work provides a paradigm for relevant researches.展开更多
Genomic surveillance of monkeypox virus(MPXV)is essential to explore the reason of its unusual outbreak.Current phylogenomic analysis of the MPXV genome mainly focuses on the effect of amino acid mutations.Herein,we e...Genomic surveillance of monkeypox virus(MPXV)is essential to explore the reason of its unusual outbreak.Current phylogenomic analysis of the MPXV genome mainly focuses on the effect of amino acid mutations.Herein,we explore the evolutionary variation of RNA G-quadruplex(RG4)of MPXV and find that the genome evolution of MPXV can also produce new effects through changes in the RG4 structure.This RG4 is located in MPXV’s only Kelch-like C9L gene,which encodes for an antagonist of the innate immune response.The evolution of this virus increases the unfolding kinetic constant of C9L RG4 and promotes the C9 protein level in living cells.Importantly,all reported MPXV genomes in 2022 carry the C9L-RG4-5 pattern with the highest unfolding kinetic constant.Additionally,the RG4 ligand,RGB-1,can impede the unfolding of C9L-RG4-5 and thereby reduce the C9 protein level.These findings carve out a new path to comprehensively understanding MPXV virology.展开更多
Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation o...Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation of gene expression. Thus, finding similar structuralfeatures in a set of RNA sequences known to play the same biological function could providesubstantial information concerning which parts of the sequences are responsible for the functionitself. Unfortunately, finding common structural elements in RNA molecules is a very challengingtask, even if limited to secondary structure. The main difficulty lies in the fact that in nearlyall the cases the structure of the molecules is unknown, has to be somehow predicted, and thatsequences with little or no similarity can fold into similar structures. Although they differ insome details, the approaches proposed so far are usually based on the preliminary alignment of thesequences and attempt to predict common structures (either local or global, or for some selectedregions) for the aligned sequences. These methods give good results when sequence and structuresimilarity are very high, but function less well when similarity is limited to small and localelements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we presentdirectly searches for regions of the sequences that can fold into similar structures, where thedegree of similarity can be defined by the user. Any information concerning sequence similarity inthe motifs can be used either as a search constraint, or a posteriori, by post-processing theoutput. The search for the regions sharing structural similarity is implemented with the affix tree,a novel text-indexing structure that significantly accelerates the search for patterns having asymmetric layout, such as those forming stem-loop structures. Tests based on experimentally knownstructures have shown that the algorithm is able to identify functional motifs in the secondarystructure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions offerritin mRNA, and the domain IV stem-loop structure in SRP RNA.展开更多
RNA folds into intricate structures that are crucial for its functions and regulations. To date, a multitude of approaches for probing structures of the whole transcriptome, i.e., RNA struc- turomes, have been develop...RNA folds into intricate structures that are crucial for its functions and regulations. To date, a multitude of approaches for probing structures of the whole transcriptome, i.e., RNA struc- turomes, have been developed. Applications of these approaches to different cell lines and tissues have generated a rich resource for the study of RNA structure-function relationships at a systems biology level. In this review, we first introduce the designs of these methods and their applications to study different RNA structuromes. We emphasize their technological differences especially their unique advantages and caveats. We then summarize the structural insights in RNA functions and regulations obtained from the studies of RNA structuromes. And finally, we propose potential directions for future improvements and studies.展开更多
RNA structures are essential to support RNA functions and regulation in various biological processes. Recently, a range of novel technologies have been developed to decode genome-wide RNA structures and novel modes of...RNA structures are essential to support RNA functions and regulation in various biological processes. Recently, a range of novel technologies have been developed to decode genome-wide RNA structures and novel modes of functionality across a wide range of species. In this review, we summarize key strategies for probing the RNA structurome and discuss the pros and cons of representative technologies. In particular, these new technologies have been applied to dissect the structural landscape of the SARS-CoV-2 RNA genome. We also summarize the functionalities of RNA structures discovered in different regulatory layers-including RNA processing, transport, localization, and mRNA translation-across viruses, bacteria, animals, and plants. We review many versatile RNA structural elements in the context of different physiological and pathological processes(e.g., cell differentiation, stress response, and viral replication). Finally, we discuss future prospects for RNA structural studies to map the RNA structurome at higher resolution and at the single-molecule and single-cell level, and to decipher novel modes of RNA structures and functions for innovative applications.展开更多
Objective To investigate the role of a potential diabetes related mitochondrial region, which includes two previously reported mutations, 3243AG and 3316GA, in Chinese patients with adult onset type 2 diabetes Met...Objective To investigate the role of a potential diabetes related mitochondrial region, which includes two previously reported mutations, 3243AG and 3316GA, in Chinese patients with adult onset type 2 diabetes Methods A total of 277 patients and 241 normal subjects were recruited for the study Mitochondrial nt 3116-3353, which spans the 16S rRNA, tRNA leu(UUR) and the NADH dehydrogenase 1 gene, were detected using polymerase chain reaction (PCR), direct DNA sequencing, PCR restriction fragment length polymorphism and allele specific PCR Variants were analyzed by two tailed Fisher exact test The function of the variants in 16S rRNA were predicted for minimal free energy secondary structures by RNA folding software mfold version 3 Results Four homoplasmic nucleotide substitutions were observed, 3200TC, 3206CT, 3290TC and 3316GA Only the 3200TC mutation is present in the diabetic population and absent in the control population No statistically significant associations were found between the other three variants and type 2 diabetes The 3200TC and 3206CT nucleotide substitutions located in 16S rRNA are novel variants The 3200TC caused a great alteration in the minimal free energy secondary structure model while the 3206CT altered normal 16S rRNA structure little Conclusions The results suggest that the 3200TC mutation is linked to the development of type 2 diabetes, but that the other observed mutations are neutral In contrast to the Japanese studies, the 3316GA does not appear to be related to type 2 diabetes展开更多
基金Supported by the Science Foundation of Hengyang Normal University of China(09A36)~~
文摘[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.
文摘We have previously reported that the human ACAT1 gene produces a chimeric mRNA through the interchromosomal processing of two discontinuous RNAs transcribed from chromosomes 1 and 7. The chimeric mRNA uses AUG1397-1399 and GGC1274-1276 as translation initiation codons to produce normal 50-kDa ACAT1 and a novel enzymatically active 56-kDa isoform, respectively, with the latter being authentically present in human cells, including human monocyte- derived macrophages. In this work, we report that RNA secondary structures located in the vicinity of the GGC1274-1276 codon are required for production of the 56-kDa isoform. The effects of the three predicted stem-loops (nt 1255-1268, 1286-1342 and 1355-1384) were tested individually by transfecting expression plasmids into cells that contained the wild-type, deleted or mutant stem-loop sequences linked to a partial ACAT1 AUG open reading frame (ORF) or to the ORFs of other genes. The expression patterns were monitored by western blot analyses. We found that the upstream stem-loop1255-1268 from chromosome 7 and downstream stem-loop1286-1342 from chromosome 1 were needed for production of the 56-kDa isoform, whereas the last stem-loop135s-1384 from chromosome 1 was dispensable. The results of experi- ments using both monocistronic and bicistronic vectors with a stable hairpin showed that translation initiation from the GGC1274-1276 codon was mediated by an internal ribosome entry site (IRES). Further experiments revealed that translation initiation from the GGC1274-1276 codon requires the upstream AU-constituted RNA secondary structure and the downstream GC-rich structure. This mechanistic work provides further support for the biological significance of the chimeric nature of the human ACAT1 transcript.
基金Supported by the National Natural Science Foundation of China(No60971089)
文摘A novel method for the prediction of RNA secondary structure was proposed based on the particle swarm optimization(PSO). PSO is known to be effective in solving many different types of optimization problems and known for being able to approximate the global optimal results in the solution space. We designed an efficient objective function according to the minimum free energy, the number of selected stems and the average length of selected stems. We calculated how many legal stems there were in the sequence, and selected some of them to obtain an optimal result using PSO in the right of the objective function. A method based on the improved particle swarm optimization(IPSO) was proposed to predict RNA secondary structure, which consisted of three stages. The first stage was applied to encoding the source sequences, and to exploring all the legal stems. Then, a set of encoded stems were created in order to prepare input data for the second stage. In the second stage, IPSO was responsible for structure selection. At last, the optimal result was obtained from the secondary structures selected via IPSO. Nine sequences from the comparative RNA website were selected for the evaluation of the proposed method. Compared with other six methods, the proposed method decreased the complexity and enhanced the sensitivity and specificity on the basis of the experiment results.
基金Project supported by the National Natural Science Foundation of China(Grant No.31570722).
文摘Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.
文摘A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.
基金supported by the National Natural Science Foundation of China(Grant No.32000462 to Fei Qi,Grant No.32170619 to Philipp Kapranovand Grant No.32201055 to Yue Chen)+2 种基金the Research Fund for International Senior Scientists from the National Natural Science Foundation of China(Grant No.32150710525 to Philipp Kapranov)the Natural Science Foundation of Fujian Province,China(Grant No.2020J02006 to Philipp Kapranov)the Scientific Research Funds of Huaqiao University,China(Grant No.22BS114 to Fei Qi,Grant No.21BS127 to Yue Chen,and Grant No.15BS101 to Philipp Kapranov).
文摘Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes.Thus,a plethora of approaches have been developed to predict,identify,or solve RNA structures based on various computational,molecular,genetic,chemical,or physicochemical strategies.Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation,time,speed,cost,and throughput,but they strongly underperform in terms of accuracy that significantly limits their broader application.Nonetheless,the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs.Here,we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity.We found that while many programs performed well in relatively simple tasks,their performance varied significantly in more complex RNA folding problems.However,in general,a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures,at least based on the specific class of sequences tested,suggesting that it may represent the future of RNA structure prediction algorithms.
文摘RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have successfully generated in vivo Structure-seq libraries in rice. We found that the structural flexibility of mRNAs might associate with the dynamics of biological function. Higher N6-methyladenosine (mSA) modification tends to have less RNA structure in 3' UTR, whereas GC content does not significantly affect in vivo mRNA structure to maintain efficient biological processes such as translation. Comparative analysis of RNA structurome between rice and Arabidopsis revealed that higher GC content does not lead to stronger structure and less RNA structural flexibility. Moreover, we found a weak correlation between sequence and structure conservation of the orthologs between rice and Arabidopsis. The conservation and divergence of both sequence and in vivo RNA structure corresponds to diverse and specific biological processes. Our results indicate that RNA secondary structure might offer a separate layer of selection to the sequence between monocot and dicot. Therefore, our study implies that RNA structure evolves differently in various biological processes to maintain robustness in development and adaptational flexibility during angiosperm evolution.
基金the National Natural Science Foundation of China(No.11601259)Shanghai Municipal Science and Technology Major Project(No.2017SHZDZX01).
文摘Background:RNA structure is the crucial basis for RNA function in various cellular processes.Over the last decade,high throughput structure profiling(SP)experiments have brought enormous insight into RNA secondary structure.Results:In this review,we first provide an overview of approaches for RNA secondary structure prediction,including free energy-based algorithms and comparative sequence analysis.Then we introduce SP technologies,databases to document SP data,and pipelines/algorithms to normalize and interpret SP data.Computational frameworks that incorporate SP data in RNA secondary structure prediction are also presented.Conclusions:We finally discuss potential directions for improvement in the prediction and differential analysis of RNA secondary structure.
基金supported by the National Key Technology R&D Program of China(Grant Nos.2017YFA0103504 to XC,2018ZX10301402 to JRY)the National Natural Science Foundation of China(Grant Nos.31671320,31871320,and 81830103 to JRY)the start-up grants from the“100 Top Talents Program”of Sun Yat-sen University,China(Grant Nos.50000-18821112 to XC,50000-18821117 to JRY).
文摘The secondary structure is a fundamental feature of both non-coding RNAs(ncRNAs)and messenger RNAs(mRNAs).However,our understanding of the secondary structures of mRNAs,especially those of the coding regions,remains elusive,likely due to translation and the lack of RNA-binding proteins that sustain the consensus structure like those binding to ncRNAs.Indeed,mRNAs have recently been found to adopt diverse alternative structures,but the overall functional significance remains untested.We hereby approach this problem by estimating the folding specificity,i.e.,the probability that a fragment of an mRNA folds back to the same partner once refolded.We show that the folding specificity of mRNAs is lower than that of ncRNAs and exhibits moderate evolutionary conservation.Notably,we find that specific rather than alternative folding is likely evolutionarily adaptive since specific folding is frequently associated with functionally important genes or sites within a gene.Additional analysis in combination with ribosome density suggests the ability to modulate ribosome movement as one potential functional advantage provided by specific folding.Our findings reveal a novel facet of the RNA structurome with important functional and evolutionary implications and indicate a potential method for distinguishing the mRNA secondary structures maintained by natural selection from molecular noise.
基金the Chinese Ministry of Science and Technology(No.2018YFA0107603 to Q.C.Z.)the National Natural Science Foundation ofChina(Nos.91740204 and 31761163007 to Q.C.Z.)+1 种基金the National Natural Science Foundation of China(No.61772197 to T.J.)the National Key Research and Development Program of China(No.2018YFC0910404 to T.J.)。
文摘Background:RNA secondary structures play a pivotal role in posttranscriptional regulation and the functions of non-coding RNAs,yet in vivo RNA secondary structures remain enigmatic.PARIS(Psoralen Analysis of RNA Interactions and Structures)is a recently developed high-throughput sequencing-based approach that enables direct capture of RNA duplex structures in vivo.However,the existence of incompatible,fuzzy pairing information obstructs the integration of PARIS data with the existing tools for reconstructing RNA secondary structure models at the single-base resolution.Methods:We introduce IRIS,a method for predicting RNA secondary structure ensembles based on PARIS data.IRIS generates a large set of candidate RNA secondary structure models under the guidance of redistributed PARIS reads and then uses a Bayesian model to identify the optimal ensemble,according to both thermodynamic principles and PARIS data.Results:The predicted RNA structure ensembles by IRIS have been verified based on evolutionary conservation information and consistency with other experimental RNA structural data.HIS is implemented in Python and freely available at http://iris.zhanglab.net.Conclusion:IRIS capitalizes upon PARIS data to improve the prediction of in vivo RNA secondary structure ensembles.We expect that IRIS will enhance the application of the PARIS technology and shed more insight on in vivo RNA secondary structures.
基金the National Natural Science Foundation of China under Grant No.60673018
文摘Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, M, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, λ and λ', to perform phylogenic analysis of homologous RNA sequences. Here, λ' is for non-structural regions of the sequence and λ' is for structural regions of the sequence; 3) we merge λ and λ' into M to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The sensitivity and specificity of our method are more accurate than those of the predictions by Pfold.
基金the National Natural Science Foundation of China (Grant No. 60621062)Teaching and Research Award Program for Out-standing Young Teachers in Higher Education Institutions of MOE (TRAPOYT), China
文摘We propose a novel model to predict RNA secondary structure based on the fuzzy sets theory. Through the fuzzy partition of state spaces and the incorporation of fuzzy goals, we can find the optimal fuzzy policy of the model using fuzzy dynamic programming algorithm effectively, and then determine optimal and suboptimal RNA secondary structures. Compared to the existing sophisticated prediction models, such as Zuker's method and the SCFG model, our fuzzy model based approach has many advantages: 1) computational complexity can be reduced by the fuzzy partition; 2) the optimal secondary structure and several suboptimal ones can be generated simultaneously; and 3) subjective prior knowledge can readily be incorporated. This paper presents a complete description of our fuzzy model and gives the implementation of the proposed method. We also apply the BJK fuzzy model structure to secondary structure predictions based on datasets of tRNA and tmRNA sequences. By the comparison of our fuzzy method with both the minimum free energy based mfold tool and the BJK grammar model of SCFG, our experimental results validate the effectiveness of the proposed method and the prediction accuracy is shown to be further improved.
文摘In the application of RNAi technology,it is an essential step to design siR NA applicable to target gene.At present,there are many researches and conclusions on siR NA design.This paper aims to the influences of mR NA secondary structure or siR NA antisense-strand secondary structure on siR NA silence efficiency.The paper also discusses the problems and sets out further insights in the research.
基金supported by the National Natural Science Foundation of China (30571377)the National High-Tech R&D Program of China (863 Program,2006AA10A204)
文摘The attenuated vaccine strains of CSFV have a 12-nucleotides (nt) insertion in the 3'-UTR of genome as compared to that of CSFV virulent strains. In this study, we found a distinct heterogeneity in the 3'-UTR of attenuated Thiverval and HCLV strains. The longest 3'-UTR of Thiverval strain was 259 base pairs (bp) with a 32-nt insertion, the shortest 3'-UTR had only 233 bp with a 6-nt insertion. The longest 3'-UTR of HCLV strain was 244 bp with a 17-nt insertion and the shortest 3' UTR was 235 bp with a 8-nt insertion. Compared with the published sequences of 3'-UTR of vaccine and virulent strains, the 3'-UTR of CSFV vaccine strains have two variable regions where insertion among the different vaccine strains were frequently found. The first is located between the second conservative TALk codon and the start of T-rich region where we found the variable length insertion in the same vaccine strain Thiveral or HCLV and the second is located between the end of T-rich region and the front of GAA eodon, however, a 4-nt deletion was found in this region in the virulent Shimen strain. These two regions may represent the "hot spot" for mutation. Modeling the secondary structures of the 3'-UTR suggests that the T-rich insertion could result in the change of structure and free energy, thus affecting the stability of the 3'-UTR structure. These findings will help to understand the mechanism of attenuated vaccines and improve vaccine safety, stability, and efficacy.
基金supported by the National Natural Science Foundation of China (31760507)the National Key R&D Program of China (2018YFC1706301)。
文摘The potato rot nematode(Ditylenchus destructor) is a very economically important nematode in agronomic and horticultural plants worldwide. In this study, 43 populations of D. destructor were collected from different hosts across China, including 37 populations from Chinese herbal medicine plants. Obtained sequences of ITS-rDNA and D2–D3of 28S-rDNA genes of D. destructor were compared and analyzed. Nine types of significant length variations in ITS sequences were observed among all populations. The differences in ITS1 length were mainly caused by the presence of repetitive elements with substantial base substitutions. Reconstructions of ITS1 secondary structures showed that the minisatellites formed a stem structure. Ten haplotypes were observed in all populations based on mutations and variations of helix H9. Among them, 3 known haplotypes(A–C) were found in 7 populations isolated from potato,sweet potato, and Codonopsis pilosula, and 7 unique haplotypes were found in other 36 populations collected from C. pilosula and Angelica sinensis compared with 7 haplotypes(A–G) according to Subbotin' system. These unique haplotypes were different from haplotypes A–G, and we named them as haplotypes H–N. The present results showed that a total of 14 haplotypes(A–N) of ITS-rDNA have been found in D. destructor. Phylogenetic analyses of ITSrDNA and D2–D3 showed that all populations of D. destructor were clustered into two major clades: one clade only containing haplotype A from sweet potato and the other containing haplotypes B–N from other plants. For further verification, PCR-ITS-RFLP profiles were conducted on 7 new haplotypes. Collectively, our study suggests that D. destructor populations on Chinese medicinal materials are very different from those on other hosts and this work provides a paradigm for relevant researches.
基金supported by the National Natural Science Foundation of China(grant nos.22034004 and 22027807)the National Key Research and Development Program of China(grant no.2021YFA1200104)+1 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(grant no.XDB36000000)the Vanke Special Fund for Public Health and Health Discipline Development(grant no.2022Z82WKJ003).
文摘Genomic surveillance of monkeypox virus(MPXV)is essential to explore the reason of its unusual outbreak.Current phylogenomic analysis of the MPXV genome mainly focuses on the effect of amino acid mutations.Herein,we explore the evolutionary variation of RNA G-quadruplex(RG4)of MPXV and find that the genome evolution of MPXV can also produce new effects through changes in the RG4 structure.This RG4 is located in MPXV’s only Kelch-like C9L gene,which encodes for an antagonist of the innate immune response.The evolution of this virus increases the unfolding kinetic constant of C9L RG4 and promotes the C9 protein level in living cells.Importantly,all reported MPXV genomes in 2022 carry the C9L-RG4-5 pattern with the highest unfolding kinetic constant.Additionally,the RG4 ligand,RGB-1,can impede the unfolding of C9L-RG4-5 and thereby reduce the C9 protein level.These findings carve out a new path to comprehensively understanding MPXV virology.
文摘Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation of gene expression. Thus, finding similar structuralfeatures in a set of RNA sequences known to play the same biological function could providesubstantial information concerning which parts of the sequences are responsible for the functionitself. Unfortunately, finding common structural elements in RNA molecules is a very challengingtask, even if limited to secondary structure. The main difficulty lies in the fact that in nearlyall the cases the structure of the molecules is unknown, has to be somehow predicted, and thatsequences with little or no similarity can fold into similar structures. Although they differ insome details, the approaches proposed so far are usually based on the preliminary alignment of thesequences and attempt to predict common structures (either local or global, or for some selectedregions) for the aligned sequences. These methods give good results when sequence and structuresimilarity are very high, but function less well when similarity is limited to small and localelements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we presentdirectly searches for regions of the sequences that can fold into similar structures, where thedegree of similarity can be defined by the user. Any information concerning sequence similarity inthe motifs can be used either as a search constraint, or a posteriori, by post-processing theoutput. The search for the regions sharing structural similarity is implemented with the affix tree,a novel text-indexing structure that significantly accelerates the search for patterns having asymmetric layout, such as those forming stem-loop structures. Tests based on experimentally knownstructures have shown that the algorithm is able to identify functional motifs in the secondarystructure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions offerritin mRNA, and the domain IV stem-loop structure in SRP RNA.
基金supported by the National Natural Science Foundation of China(Grant No.31671355)the National Thousand Young Talents Program of China to QCZ
文摘RNA folds into intricate structures that are crucial for its functions and regulations. To date, a multitude of approaches for probing structures of the whole transcriptome, i.e., RNA struc- turomes, have been developed. Applications of these approaches to different cell lines and tissues have generated a rich resource for the study of RNA structure-function relationships at a systems biology level. In this review, we first introduce the designs of these methods and their applications to study different RNA structuromes. We emphasize their technological differences especially their unique advantages and caveats. We then summarize the structural insights in RNA functions and regulations obtained from the studies of RNA structuromes. And finally, we propose potential directions for future improvements and studies.
基金supported by the National Key Research and Development Program of China(2021YFE0114900)the National Natural Science Foundation of China(91940303,91940306,32025008,32170262,31922039,U1832215,32170229)+6 种基金the Natural Science Foundation of Zhejiang Province(LD21C050002)the Starry Night Science Fund at Shanghai Institute for Advanced Study of Zhejiang University(SN-ZJU-SIAS-009)the Beijing Advanced Innovation Center for Structural Biology,Shenzhen Basic Research Project(JCYJ20180507181642811)Research Grants Council of the Hong Kong SAR,China Projects(City U 11100421,City U 11101519,City U 11100218,N_City U110/17)Croucher Foundation Project(9509003)State Key Laboratory of Marine Pollution Director Discretionary Fund,City University of Hong Kong Projects(7005503,9667222,9680261)the United Kingdom Biotechnology and Biological Sciences Research Council(BBSRC:BBS/E/J/000PR9788)。
文摘RNA structures are essential to support RNA functions and regulation in various biological processes. Recently, a range of novel technologies have been developed to decode genome-wide RNA structures and novel modes of functionality across a wide range of species. In this review, we summarize key strategies for probing the RNA structurome and discuss the pros and cons of representative technologies. In particular, these new technologies have been applied to dissect the structural landscape of the SARS-CoV-2 RNA genome. We also summarize the functionalities of RNA structures discovered in different regulatory layers-including RNA processing, transport, localization, and mRNA translation-across viruses, bacteria, animals, and plants. We review many versatile RNA structural elements in the context of different physiological and pathological processes(e.g., cell differentiation, stress response, and viral replication). Finally, we discuss future prospects for RNA structural studies to map the RNA structurome at higher resolution and at the single-molecule and single-cell level, and to decipher novel modes of RNA structures and functions for innovative applications.
文摘Objective To investigate the role of a potential diabetes related mitochondrial region, which includes two previously reported mutations, 3243AG and 3316GA, in Chinese patients with adult onset type 2 diabetes Methods A total of 277 patients and 241 normal subjects were recruited for the study Mitochondrial nt 3116-3353, which spans the 16S rRNA, tRNA leu(UUR) and the NADH dehydrogenase 1 gene, were detected using polymerase chain reaction (PCR), direct DNA sequencing, PCR restriction fragment length polymorphism and allele specific PCR Variants were analyzed by two tailed Fisher exact test The function of the variants in 16S rRNA were predicted for minimal free energy secondary structures by RNA folding software mfold version 3 Results Four homoplasmic nucleotide substitutions were observed, 3200TC, 3206CT, 3290TC and 3316GA Only the 3200TC mutation is present in the diabetic population and absent in the control population No statistically significant associations were found between the other three variants and type 2 diabetes The 3200TC and 3206CT nucleotide substitutions located in 16S rRNA are novel variants The 3200TC caused a great alteration in the minimal free energy secondary structure model while the 3206CT altered normal 16S rRNA structure little Conclusions The results suggest that the 3200TC mutation is linked to the development of type 2 diabetes, but that the other observed mutations are neutral In contrast to the Japanese studies, the 3316GA does not appear to be related to type 2 diabetes