A novel method for the prediction of RNA secondary structure was proposed based on the particle swarm optimization(PSO). PSO is known to be effective in solving many different types of optimization problems and know...A novel method for the prediction of RNA secondary structure was proposed based on the particle swarm optimization(PSO). PSO is known to be effective in solving many different types of optimization problems and known for being able to approximate the global optimal results in the solution space. We designed an efficient objective function according to the minimum free energy, the number of selected stems and the average length of selected stems. We calculated how many legal stems there were in the sequence, and selected some of them to obtain an optimal result using PSO in the right of the objective function. A method based on the improved particle swarm optimization(IPSO) was proposed to predict RNA secondary structure, which consisted of three stages. The first stage was applied to encoding the source sequences, and to exploring all the legal stems. Then, a set of encoded stems were created in order to prepare input data for the second stage. In the second stage, IPSO was responsible for structure selection. At last, the optimal result was obtained from the secondary structures selected via IPSO. Nine sequences from the comparative RNA website were selected for the evaluation of the proposed method. Compared with other six methods, the proposed method decreased the complexity and enhanced the sensitivity and specificity on the basis of the experiment results.展开更多
Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot o...Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.展开更多
A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudokno...A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.展开更多
RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have suc...RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have successfully generated in vivo Structure-seq libraries in rice. We found that the structural flexibility of mRNAs might associate with the dynamics of biological function. Higher N6-methyladenosine (mSA) modification tends to have less RNA structure in 3' UTR, whereas GC content does not significantly affect in vivo mRNA structure to maintain efficient biological processes such as translation. Comparative analysis of RNA structurome between rice and Arabidopsis revealed that higher GC content does not lead to stronger structure and less RNA structural flexibility. Moreover, we found a weak correlation between sequence and structure conservation of the orthologs between rice and Arabidopsis. The conservation and divergence of both sequence and in vivo RNA structure corresponds to diverse and specific biological processes. Our results indicate that RNA secondary structure might offer a separate layer of selection to the sequence between monocot and dicot. Therefore, our study implies that RNA structure evolves differently in various biological processes to maintain robustness in development and adaptational flexibility during angiosperm evolution.展开更多
Background:RNA structure is the crucial basis for RNA function in various cellular processes.Over the last decade,high throughput structure profiling(SP)experiments have brought enormous insight into RNA secondary str...Background:RNA structure is the crucial basis for RNA function in various cellular processes.Over the last decade,high throughput structure profiling(SP)experiments have brought enormous insight into RNA secondary structure.Results:In this review,we first provide an overview of approaches for RNA secondary structure prediction,including free energy-based algorithms and comparative sequence analysis.Then we introduce SP technologies,databases to document SP data,and pipelines/algorithms to normalize and interpret SP data.Computational frameworks that incorporate SP data in RNA secondary structure prediction are also presented.Conclusions:We finally discuss potential directions for improvement in the prediction and differential analysis of RNA secondary structure.展开更多
The secondary structure is a fundamental feature of both non-coding RNAs(ncRNAs)and messenger RNAs(mRNAs).However,our understanding of the secondary structures of mRNAs,especially those of the coding regions,remains e...The secondary structure is a fundamental feature of both non-coding RNAs(ncRNAs)and messenger RNAs(mRNAs).However,our understanding of the secondary structures of mRNAs,especially those of the coding regions,remains elusive,likely due to translation and the lack of RNA-binding proteins that sustain the consensus structure like those binding to ncRNAs.Indeed,mRNAs have recently been found to adopt diverse alternative structures,but the overall functional significance remains untested.We hereby approach this problem by estimating the folding specificity,i.e.,the probability that a fragment of an mRNA folds back to the same partner once refolded.We show that the folding specificity of mRNAs is lower than that of ncRNAs and exhibits moderate evolutionary conservation.Notably,we find that specific rather than alternative folding is likely evolutionarily adaptive since specific folding is frequently associated with functionally important genes or sites within a gene.Additional analysis in combination with ribosome density suggests the ability to modulate ribosome movement as one potential functional advantage provided by specific folding.Our findings reveal a novel facet of the RNA structurome with important functional and evolutionary implications and indicate a potential method for distinguishing the mRNA secondary structures maintained by natural selection from molecular noise.展开更多
Background:RNA secondary structures play a pivotal role in posttranscriptional regulation and the functions of non-coding RNAs,yet in vivo RNA secondary structures remain enigmatic.PARIS(Psoralen Analysis of RNA Inter...Background:RNA secondary structures play a pivotal role in posttranscriptional regulation and the functions of non-coding RNAs,yet in vivo RNA secondary structures remain enigmatic.PARIS(Psoralen Analysis of RNA Interactions and Structures)is a recently developed high-throughput sequencing-based approach that enables direct capture of RNA duplex structures in vivo.However,the existence of incompatible,fuzzy pairing information obstructs the integration of PARIS data with the existing tools for reconstructing RNA secondary structure models at the single-base resolution.Methods:We introduce IRIS,a method for predicting RNA secondary structure ensembles based on PARIS data.IRIS generates a large set of candidate RNA secondary structure models under the guidance of redistributed PARIS reads and then uses a Bayesian model to identify the optimal ensemble,according to both thermodynamic principles and PARIS data.Results:The predicted RNA structure ensembles by IRIS have been verified based on evolutionary conservation information and consistency with other experimental RNA structural data.HIS is implemented in Python and freely available at http://iris.zhanglab.net.Conclusion:IRIS capitalizes upon PARIS data to improve the prediction of in vivo RNA secondary structure ensembles.We expect that IRIS will enhance the application of the PARIS technology and shed more insight on in vivo RNA secondary structures.展开更多
Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. How...Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, M, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, λ and λ', to perform phylogenic analysis of homologous RNA sequences. Here, λ' is for non-structural regions of the sequence and λ' is for structural regions of the sequence; 3) we merge λ and λ' into M to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The sensitivity and specificity of our method are more accurate than those of the predictions by Pfold.展开更多
We propose a novel model to predict RNA secondary structure based on the fuzzy sets theory. Through the fuzzy partition of state spaces and the incorporation of fuzzy goals, we can find the optimal fuzzy policy of the...We propose a novel model to predict RNA secondary structure based on the fuzzy sets theory. Through the fuzzy partition of state spaces and the incorporation of fuzzy goals, we can find the optimal fuzzy policy of the model using fuzzy dynamic programming algorithm effectively, and then determine optimal and suboptimal RNA secondary structures. Compared to the existing sophisticated prediction models, such as Zuker's method and the SCFG model, our fuzzy model based approach has many advantages: 1) computational complexity can be reduced by the fuzzy partition; 2) the optimal secondary structure and several suboptimal ones can be generated simultaneously; and 3) subjective prior knowledge can readily be incorporated. This paper presents a complete description of our fuzzy model and gives the implementation of the proposed method. We also apply the BJK fuzzy model structure to secondary structure predictions based on datasets of tRNA and tmRNA sequences. By the comparison of our fuzzy method with both the minimum free energy based mfold tool and the BJK grammar model of SCFG, our experimental results validate the effectiveness of the proposed method and the prediction accuracy is shown to be further improved.展开更多
The attenuated vaccine strains of CSFV have a 12-nucleotides (nt) insertion in the 3'-UTR of genome as compared to that of CSFV virulent strains. In this study, we found a distinct heterogeneity in the 3'-UTR of a...The attenuated vaccine strains of CSFV have a 12-nucleotides (nt) insertion in the 3'-UTR of genome as compared to that of CSFV virulent strains. In this study, we found a distinct heterogeneity in the 3'-UTR of attenuated Thiverval and HCLV strains. The longest 3'-UTR of Thiverval strain was 259 base pairs (bp) with a 32-nt insertion, the shortest 3'-UTR had only 233 bp with a 6-nt insertion. The longest 3'-UTR of HCLV strain was 244 bp with a 17-nt insertion and the shortest 3' UTR was 235 bp with a 8-nt insertion. Compared with the published sequences of 3'-UTR of vaccine and virulent strains, the 3'-UTR of CSFV vaccine strains have two variable regions where insertion among the different vaccine strains were frequently found. The first is located between the second conservative TALk codon and the start of T-rich region where we found the variable length insertion in the same vaccine strain Thiveral or HCLV and the second is located between the end of T-rich region and the front of GAA eodon, however, a 4-nt deletion was found in this region in the virulent Shimen strain. These two regions may represent the "hot spot" for mutation. Modeling the secondary structures of the 3'-UTR suggests that the T-rich insertion could result in the change of structure and free energy, thus affecting the stability of the 3'-UTR structure. These findings will help to understand the mechanism of attenuated vaccines and improve vaccine safety, stability, and efficacy.展开更多
The potato rot nematode(Ditylenchus destructor) is a very economically important nematode in agronomic and horticultural plants worldwide. In this study, 43 populations of D. destructor were collected from different h...The potato rot nematode(Ditylenchus destructor) is a very economically important nematode in agronomic and horticultural plants worldwide. In this study, 43 populations of D. destructor were collected from different hosts across China, including 37 populations from Chinese herbal medicine plants. Obtained sequences of ITS-rDNA and D2–D3of 28S-rDNA genes of D. destructor were compared and analyzed. Nine types of significant length variations in ITS sequences were observed among all populations. The differences in ITS1 length were mainly caused by the presence of repetitive elements with substantial base substitutions. Reconstructions of ITS1 secondary structures showed that the minisatellites formed a stem structure. Ten haplotypes were observed in all populations based on mutations and variations of helix H9. Among them, 3 known haplotypes(A–C) were found in 7 populations isolated from potato,sweet potato, and Codonopsis pilosula, and 7 unique haplotypes were found in other 36 populations collected from C. pilosula and Angelica sinensis compared with 7 haplotypes(A–G) according to Subbotin' system. These unique haplotypes were different from haplotypes A–G, and we named them as haplotypes H–N. The present results showed that a total of 14 haplotypes(A–N) of ITS-rDNA have been found in D. destructor. Phylogenetic analyses of ITSrDNA and D2–D3 showed that all populations of D. destructor were clustered into two major clades: one clade only containing haplotype A from sweet potato and the other containing haplotypes B–N from other plants. For further verification, PCR-ITS-RFLP profiles were conducted on 7 new haplotypes. Collectively, our study suggests that D. destructor populations on Chinese medicinal materials are very different from those on other hosts and this work provides a paradigm for relevant researches.展开更多
Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation o...Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation of gene expression. Thus, finding similar structuralfeatures in a set of RNA sequences known to play the same biological function could providesubstantial information concerning which parts of the sequences are responsible for the functionitself. Unfortunately, finding common structural elements in RNA molecules is a very challengingtask, even if limited to secondary structure. The main difficulty lies in the fact that in nearlyall the cases the structure of the molecules is unknown, has to be somehow predicted, and thatsequences with little or no similarity can fold into similar structures. Although they differ insome details, the approaches proposed so far are usually based on the preliminary alignment of thesequences and attempt to predict common structures (either local or global, or for some selectedregions) for the aligned sequences. These methods give good results when sequence and structuresimilarity are very high, but function less well when similarity is limited to small and localelements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we presentdirectly searches for regions of the sequences that can fold into similar structures, where thedegree of similarity can be defined by the user. Any information concerning sequence similarity inthe motifs can be used either as a search constraint, or a posteriori, by post-processing theoutput. The search for the regions sharing structural similarity is implemented with the affix tree,a novel text-indexing structure that significantly accelerates the search for patterns having asymmetric layout, such as those forming stem-loop structures. Tests based on experimentally knownstructures have shown that the algorithm is able to identify functional motifs in the secondarystructure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions offerritin mRNA, and the domain IV stem-loop structure in SRP RNA.展开更多
RNA folds into intricate structures that are crucial for its functions and regulations. To date, a multitude of approaches for probing structures of the whole transcriptome, i.e., RNA struc- turomes, have been develop...RNA folds into intricate structures that are crucial for its functions and regulations. To date, a multitude of approaches for probing structures of the whole transcriptome, i.e., RNA struc- turomes, have been developed. Applications of these approaches to different cell lines and tissues have generated a rich resource for the study of RNA structure-function relationships at a systems biology level. In this review, we first introduce the designs of these methods and their applications to study different RNA structuromes. We emphasize their technological differences especially their unique advantages and caveats. We then summarize the structural insights in RNA functions and regulations obtained from the studies of RNA structuromes. And finally, we propose potential directions for future improvements and studies.展开更多
RNA structures are essential to support RNA functions and regulation in various biological processes. Recently, a range of novel technologies have been developed to decode genome-wide RNA structures and novel modes of...RNA structures are essential to support RNA functions and regulation in various biological processes. Recently, a range of novel technologies have been developed to decode genome-wide RNA structures and novel modes of functionality across a wide range of species. In this review, we summarize key strategies for probing the RNA structurome and discuss the pros and cons of representative technologies. In particular, these new technologies have been applied to dissect the structural landscape of the SARS-CoV-2 RNA genome. We also summarize the functionalities of RNA structures discovered in different regulatory layers-including RNA processing, transport, localization, and mRNA translation-across viruses, bacteria, animals, and plants. We review many versatile RNA structural elements in the context of different physiological and pathological processes(e.g., cell differentiation, stress response, and viral replication). Finally, we discuss future prospects for RNA structural studies to map the RNA structurome at higher resolution and at the single-molecule and single-cell level, and to decipher novel modes of RNA structures and functions for innovative applications.展开更多
Background: Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profilin...Background: Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transeriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. Results: We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. Conclusions: To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.展开更多
U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the...U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase Ⅱ. Based on the results of this search, we discuss the high variability of U7 snRNAs in both sequence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short sequence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database.展开更多
Purpose–The purpose of this paper is to present a study of the effect of different types of annealing schedules for a ribonucleic acid(RNA)secondary structure prediction algorithm based on simulated annealing(SA).Des...Purpose–The purpose of this paper is to present a study of the effect of different types of annealing schedules for a ribonucleic acid(RNA)secondary structure prediction algorithm based on simulated annealing(SA).Design/methodology/approach–An RNA folding algorithm was implemented that assembles the final structure from potential substructures(helixes).Structures are encoded as a permutation of helixes.An SA searches this space of permutations.Parameters and annealing schedules were studied and fine-tuned to optimize algorithm performance.Findings–In comparing with mfold,the SA algorithm shows comparable results(in terms of F-measure)even with a less sophisticated thermodynamic model.In terms of average specificity,the SA algorithm has provided surpassing results.Research limitations/implications–Most of the underlying thermodynamic models are too simplistic and incomplete to accurately model the free energy for larger structures.This is the largest limitation of free energy-based RNA folding algorithms in general.Practical implications–The algorithm offers a different approach that can be used in practice to fold RNA sequences quickly.Originality/value–The algorithm is one of only two SA-based RNA folding algorithms.The authors use a very different encoding,based on permutation of candidate helixes.The in depth study of annealing schedules and other parameters makes the algorithm a strong contender.Another benefit is that new thermodynamic models can be incorporated with relative ease(which is not the case for algorithms based on dynamic programming).展开更多
Background and Aims:The noncoding regions in the 3'-untranslated region (UTR) of the hepatitis C virus (HCV)genome contain secondary structures that are important for replication.The aim of this study was to ident...Background and Aims:The noncoding regions in the 3'-untranslated region (UTR) of the hepatitis C virus (HCV)genome contain secondary structures that are important for replication.The aim of this study was to identify detailed conformational elements of the X-region involved in HCV replication.Methods:Ribonucleic acid (RNA) structural analogs X94,X12,and X12c were constructed to have identical conformation but 94%,12%,and 0% sequence identity,respectively,to the X region of HCV genotype 2a.Effects of structural analogs on replication of HCV genotypes 1b and 2a HCV RNA were studied by quantitative reverse transcriptase polymerase chain reaction.Results:In replicon BB7 cells,a constitutive replication model,HCV RNA levels decreased to 55%,52%,53%,and 54% after transfection with expression plasmids generating RNA structural analogs 5B-46,X-94,X-12,and X-12c,respectively (p<0.001 for all).In an HCV genotype 2a infection model,RNA analogs 5B-46,X-94,and X-12 in hepatic cells inhibited replication to 11%,9%,and 12%,respectively.Because the X-12 analog was only 12% identical to the corresponding sequence of HCV genotype 2a,the sequence per se,or antisense effects were unlikely to be involved.Conclusions:The data suggest that conformation of secondary structures in 3'-UTR of HCV RNA genome is required for HCV replication.Stable expression of RNA analogs predicted to have identical stem-loop structures might inhibit HCV infection of hepatocytes in liver and may represent a novel approach to design anti-HCV agents.展开更多
RNA secondary structure has become the most exploitable feature for ab initio detection of non-coding RNA(nc RNA) genes from genome sequences. Previous work has used Minimum Free Energy(MFE) based methods develope...RNA secondary structure has become the most exploitable feature for ab initio detection of non-coding RNA(nc RNA) genes from genome sequences. Previous work has used Minimum Free Energy(MFE) based methods developed to identify nc RNAs by measuring sequence fold stability and certainty. However, these methods yielded variable performances across different nc RNA species. Designing novel reliable structural measures will help to develop effective nc RNA gene finding tools. This paper introduces a new RNA structural measure based on a novel RNA secondary structure ensemble constrained by characteristics of native RNA tertiary structures. The new method makes it possible to achieve a performance leap from the previous structure-based methods. Test results on standard nc RNA datasets(benchmarks) demonstrate that this method can effectively separate most nc RNAs families from genome backgrounds.展开更多
基金Supported by the National Natural Science Foundation of China(No60971089)
文摘A novel method for the prediction of RNA secondary structure was proposed based on the particle swarm optimization(PSO). PSO is known to be effective in solving many different types of optimization problems and known for being able to approximate the global optimal results in the solution space. We designed an efficient objective function according to the minimum free energy, the number of selected stems and the average length of selected stems. We calculated how many legal stems there were in the sequence, and selected some of them to obtain an optimal result using PSO in the right of the objective function. A method based on the improved particle swarm optimization(IPSO) was proposed to predict RNA secondary structure, which consisted of three stages. The first stage was applied to encoding the source sequences, and to exploring all the legal stems. Then, a set of encoded stems were created in order to prepare input data for the second stage. In the second stage, IPSO was responsible for structure selection. At last, the optimal result was obtained from the secondary structures selected via IPSO. Nine sequences from the comparative RNA website were selected for the evaluation of the proposed method. Compared with other six methods, the proposed method decreased the complexity and enhanced the sensitivity and specificity on the basis of the experiment results.
基金Project supported by the National Natural Science Foundation of China(Grant No.31570722).
文摘Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.
文摘A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.
文摘RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have successfully generated in vivo Structure-seq libraries in rice. We found that the structural flexibility of mRNAs might associate with the dynamics of biological function. Higher N6-methyladenosine (mSA) modification tends to have less RNA structure in 3' UTR, whereas GC content does not significantly affect in vivo mRNA structure to maintain efficient biological processes such as translation. Comparative analysis of RNA structurome between rice and Arabidopsis revealed that higher GC content does not lead to stronger structure and less RNA structural flexibility. Moreover, we found a weak correlation between sequence and structure conservation of the orthologs between rice and Arabidopsis. The conservation and divergence of both sequence and in vivo RNA structure corresponds to diverse and specific biological processes. Our results indicate that RNA secondary structure might offer a separate layer of selection to the sequence between monocot and dicot. Therefore, our study implies that RNA structure evolves differently in various biological processes to maintain robustness in development and adaptational flexibility during angiosperm evolution.
基金the National Natural Science Foundation of China(No.11601259)Shanghai Municipal Science and Technology Major Project(No.2017SHZDZX01).
文摘Background:RNA structure is the crucial basis for RNA function in various cellular processes.Over the last decade,high throughput structure profiling(SP)experiments have brought enormous insight into RNA secondary structure.Results:In this review,we first provide an overview of approaches for RNA secondary structure prediction,including free energy-based algorithms and comparative sequence analysis.Then we introduce SP technologies,databases to document SP data,and pipelines/algorithms to normalize and interpret SP data.Computational frameworks that incorporate SP data in RNA secondary structure prediction are also presented.Conclusions:We finally discuss potential directions for improvement in the prediction and differential analysis of RNA secondary structure.
基金supported by the National Key Technology R&D Program of China(Grant Nos.2017YFA0103504 to XC,2018ZX10301402 to JRY)the National Natural Science Foundation of China(Grant Nos.31671320,31871320,and 81830103 to JRY)the start-up grants from the“100 Top Talents Program”of Sun Yat-sen University,China(Grant Nos.50000-18821112 to XC,50000-18821117 to JRY).
文摘The secondary structure is a fundamental feature of both non-coding RNAs(ncRNAs)and messenger RNAs(mRNAs).However,our understanding of the secondary structures of mRNAs,especially those of the coding regions,remains elusive,likely due to translation and the lack of RNA-binding proteins that sustain the consensus structure like those binding to ncRNAs.Indeed,mRNAs have recently been found to adopt diverse alternative structures,but the overall functional significance remains untested.We hereby approach this problem by estimating the folding specificity,i.e.,the probability that a fragment of an mRNA folds back to the same partner once refolded.We show that the folding specificity of mRNAs is lower than that of ncRNAs and exhibits moderate evolutionary conservation.Notably,we find that specific rather than alternative folding is likely evolutionarily adaptive since specific folding is frequently associated with functionally important genes or sites within a gene.Additional analysis in combination with ribosome density suggests the ability to modulate ribosome movement as one potential functional advantage provided by specific folding.Our findings reveal a novel facet of the RNA structurome with important functional and evolutionary implications and indicate a potential method for distinguishing the mRNA secondary structures maintained by natural selection from molecular noise.
基金the Chinese Ministry of Science and Technology(No.2018YFA0107603 to Q.C.Z.)the National Natural Science Foundation ofChina(Nos.91740204 and 31761163007 to Q.C.Z.)+1 种基金the National Natural Science Foundation of China(No.61772197 to T.J.)the National Key Research and Development Program of China(No.2018YFC0910404 to T.J.)。
文摘Background:RNA secondary structures play a pivotal role in posttranscriptional regulation and the functions of non-coding RNAs,yet in vivo RNA secondary structures remain enigmatic.PARIS(Psoralen Analysis of RNA Interactions and Structures)is a recently developed high-throughput sequencing-based approach that enables direct capture of RNA duplex structures in vivo.However,the existence of incompatible,fuzzy pairing information obstructs the integration of PARIS data with the existing tools for reconstructing RNA secondary structure models at the single-base resolution.Methods:We introduce IRIS,a method for predicting RNA secondary structure ensembles based on PARIS data.IRIS generates a large set of candidate RNA secondary structure models under the guidance of redistributed PARIS reads and then uses a Bayesian model to identify the optimal ensemble,according to both thermodynamic principles and PARIS data.Results:The predicted RNA structure ensembles by IRIS have been verified based on evolutionary conservation information and consistency with other experimental RNA structural data.HIS is implemented in Python and freely available at http://iris.zhanglab.net.Conclusion:IRIS capitalizes upon PARIS data to improve the prediction of in vivo RNA secondary structure ensembles.We expect that IRIS will enhance the application of the PARIS technology and shed more insight on in vivo RNA secondary structures.
基金the National Natural Science Foundation of China under Grant No.60673018
文摘Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, M, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, λ and λ', to perform phylogenic analysis of homologous RNA sequences. Here, λ' is for non-structural regions of the sequence and λ' is for structural regions of the sequence; 3) we merge λ and λ' into M to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The sensitivity and specificity of our method are more accurate than those of the predictions by Pfold.
基金the National Natural Science Foundation of China (Grant No. 60621062)Teaching and Research Award Program for Out-standing Young Teachers in Higher Education Institutions of MOE (TRAPOYT), China
文摘We propose a novel model to predict RNA secondary structure based on the fuzzy sets theory. Through the fuzzy partition of state spaces and the incorporation of fuzzy goals, we can find the optimal fuzzy policy of the model using fuzzy dynamic programming algorithm effectively, and then determine optimal and suboptimal RNA secondary structures. Compared to the existing sophisticated prediction models, such as Zuker's method and the SCFG model, our fuzzy model based approach has many advantages: 1) computational complexity can be reduced by the fuzzy partition; 2) the optimal secondary structure and several suboptimal ones can be generated simultaneously; and 3) subjective prior knowledge can readily be incorporated. This paper presents a complete description of our fuzzy model and gives the implementation of the proposed method. We also apply the BJK fuzzy model structure to secondary structure predictions based on datasets of tRNA and tmRNA sequences. By the comparison of our fuzzy method with both the minimum free energy based mfold tool and the BJK grammar model of SCFG, our experimental results validate the effectiveness of the proposed method and the prediction accuracy is shown to be further improved.
基金supported by the National Natural Science Foundation of China (30571377)the National High-Tech R&D Program of China (863 Program,2006AA10A204)
文摘The attenuated vaccine strains of CSFV have a 12-nucleotides (nt) insertion in the 3'-UTR of genome as compared to that of CSFV virulent strains. In this study, we found a distinct heterogeneity in the 3'-UTR of attenuated Thiverval and HCLV strains. The longest 3'-UTR of Thiverval strain was 259 base pairs (bp) with a 32-nt insertion, the shortest 3'-UTR had only 233 bp with a 6-nt insertion. The longest 3'-UTR of HCLV strain was 244 bp with a 17-nt insertion and the shortest 3' UTR was 235 bp with a 8-nt insertion. Compared with the published sequences of 3'-UTR of vaccine and virulent strains, the 3'-UTR of CSFV vaccine strains have two variable regions where insertion among the different vaccine strains were frequently found. The first is located between the second conservative TALk codon and the start of T-rich region where we found the variable length insertion in the same vaccine strain Thiveral or HCLV and the second is located between the end of T-rich region and the front of GAA eodon, however, a 4-nt deletion was found in this region in the virulent Shimen strain. These two regions may represent the "hot spot" for mutation. Modeling the secondary structures of the 3'-UTR suggests that the T-rich insertion could result in the change of structure and free energy, thus affecting the stability of the 3'-UTR structure. These findings will help to understand the mechanism of attenuated vaccines and improve vaccine safety, stability, and efficacy.
基金supported by the National Natural Science Foundation of China (31760507)the National Key R&D Program of China (2018YFC1706301)。
文摘The potato rot nematode(Ditylenchus destructor) is a very economically important nematode in agronomic and horticultural plants worldwide. In this study, 43 populations of D. destructor were collected from different hosts across China, including 37 populations from Chinese herbal medicine plants. Obtained sequences of ITS-rDNA and D2–D3of 28S-rDNA genes of D. destructor were compared and analyzed. Nine types of significant length variations in ITS sequences were observed among all populations. The differences in ITS1 length were mainly caused by the presence of repetitive elements with substantial base substitutions. Reconstructions of ITS1 secondary structures showed that the minisatellites formed a stem structure. Ten haplotypes were observed in all populations based on mutations and variations of helix H9. Among them, 3 known haplotypes(A–C) were found in 7 populations isolated from potato,sweet potato, and Codonopsis pilosula, and 7 unique haplotypes were found in other 36 populations collected from C. pilosula and Angelica sinensis compared with 7 haplotypes(A–G) according to Subbotin' system. These unique haplotypes were different from haplotypes A–G, and we named them as haplotypes H–N. The present results showed that a total of 14 haplotypes(A–N) of ITS-rDNA have been found in D. destructor. Phylogenetic analyses of ITSrDNA and D2–D3 showed that all populations of D. destructor were clustered into two major clades: one clade only containing haplotype A from sweet potato and the other containing haplotypes B–N from other plants. For further verification, PCR-ITS-RFLP profiles were conducted on 7 new haplotypes. Collectively, our study suggests that D. destructor populations on Chinese medicinal materials are very different from those on other hosts and this work provides a paradigm for relevant researches.
文摘Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation of gene expression. Thus, finding similar structuralfeatures in a set of RNA sequences known to play the same biological function could providesubstantial information concerning which parts of the sequences are responsible for the functionitself. Unfortunately, finding common structural elements in RNA molecules is a very challengingtask, even if limited to secondary structure. The main difficulty lies in the fact that in nearlyall the cases the structure of the molecules is unknown, has to be somehow predicted, and thatsequences with little or no similarity can fold into similar structures. Although they differ insome details, the approaches proposed so far are usually based on the preliminary alignment of thesequences and attempt to predict common structures (either local or global, or for some selectedregions) for the aligned sequences. These methods give good results when sequence and structuresimilarity are very high, but function less well when similarity is limited to small and localelements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we presentdirectly searches for regions of the sequences that can fold into similar structures, where thedegree of similarity can be defined by the user. Any information concerning sequence similarity inthe motifs can be used either as a search constraint, or a posteriori, by post-processing theoutput. The search for the regions sharing structural similarity is implemented with the affix tree,a novel text-indexing structure that significantly accelerates the search for patterns having asymmetric layout, such as those forming stem-loop structures. Tests based on experimentally knownstructures have shown that the algorithm is able to identify functional motifs in the secondarystructure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions offerritin mRNA, and the domain IV stem-loop structure in SRP RNA.
基金supported by the National Natural Science Foundation of China(Grant No.31671355)the National Thousand Young Talents Program of China to QCZ
文摘RNA folds into intricate structures that are crucial for its functions and regulations. To date, a multitude of approaches for probing structures of the whole transcriptome, i.e., RNA struc- turomes, have been developed. Applications of these approaches to different cell lines and tissues have generated a rich resource for the study of RNA structure-function relationships at a systems biology level. In this review, we first introduce the designs of these methods and their applications to study different RNA structuromes. We emphasize their technological differences especially their unique advantages and caveats. We then summarize the structural insights in RNA functions and regulations obtained from the studies of RNA structuromes. And finally, we propose potential directions for future improvements and studies.
基金supported by the National Key Research and Development Program of China(2021YFE0114900)the National Natural Science Foundation of China(91940303,91940306,32025008,32170262,31922039,U1832215,32170229)+6 种基金the Natural Science Foundation of Zhejiang Province(LD21C050002)the Starry Night Science Fund at Shanghai Institute for Advanced Study of Zhejiang University(SN-ZJU-SIAS-009)the Beijing Advanced Innovation Center for Structural Biology,Shenzhen Basic Research Project(JCYJ20180507181642811)Research Grants Council of the Hong Kong SAR,China Projects(City U 11100421,City U 11101519,City U 11100218,N_City U110/17)Croucher Foundation Project(9509003)State Key Laboratory of Marine Pollution Director Discretionary Fund,City University of Hong Kong Projects(7005503,9667222,9680261)the United Kingdom Biotechnology and Biological Sciences Research Council(BBSRC:BBS/E/J/000PR9788)。
文摘RNA structures are essential to support RNA functions and regulation in various biological processes. Recently, a range of novel technologies have been developed to decode genome-wide RNA structures and novel modes of functionality across a wide range of species. In this review, we summarize key strategies for probing the RNA structurome and discuss the pros and cons of representative technologies. In particular, these new technologies have been applied to dissect the structural landscape of the SARS-CoV-2 RNA genome. We also summarize the functionalities of RNA structures discovered in different regulatory layers-including RNA processing, transport, localization, and mRNA translation-across viruses, bacteria, animals, and plants. We review many versatile RNA structural elements in the context of different physiological and pathological processes(e.g., cell differentiation, stress response, and viral replication). Finally, we discuss future prospects for RNA structural studies to map the RNA structurome at higher resolution and at the single-molecule and single-cell level, and to decipher novel modes of RNA structures and functions for innovative applications.
文摘Background: Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transeriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. Results: We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. Conclusions: To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.
文摘U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase Ⅱ. Based on the results of this search, we discuss the high variability of U7 snRNAs in both sequence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short sequence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database.
基金the NSERC for this research under Research Grant Number RG-PIN 238298Both authors would like to acknowledge the support of the InfoNet Media Centre funded by the Canadian Foundation for Innovation(CFI)under grant number CFI-3648.
文摘Purpose–The purpose of this paper is to present a study of the effect of different types of annealing schedules for a ribonucleic acid(RNA)secondary structure prediction algorithm based on simulated annealing(SA).Design/methodology/approach–An RNA folding algorithm was implemented that assembles the final structure from potential substructures(helixes).Structures are encoded as a permutation of helixes.An SA searches this space of permutations.Parameters and annealing schedules were studied and fine-tuned to optimize algorithm performance.Findings–In comparing with mfold,the SA algorithm shows comparable results(in terms of F-measure)even with a less sophisticated thermodynamic model.In terms of average specificity,the SA algorithm has provided surpassing results.Research limitations/implications–Most of the underlying thermodynamic models are too simplistic and incomplete to accurately model the free energy for larger structures.This is the largest limitation of free energy-based RNA folding algorithms in general.Practical implications–The algorithm offers a different approach that can be used in practice to fold RNA sequences quickly.Originality/value–The algorithm is one of only two SA-based RNA folding algorithms.The authors use a very different encoding,based on permutation of candidate helixes.The in depth study of annealing schedules and other parameters makes the algorithm a strong contender.Another benefit is that new thermodynamic models can be incorporated with relative ease(which is not the case for algorithms based on dynamic programming).
文摘Background and Aims:The noncoding regions in the 3'-untranslated region (UTR) of the hepatitis C virus (HCV)genome contain secondary structures that are important for replication.The aim of this study was to identify detailed conformational elements of the X-region involved in HCV replication.Methods:Ribonucleic acid (RNA) structural analogs X94,X12,and X12c were constructed to have identical conformation but 94%,12%,and 0% sequence identity,respectively,to the X region of HCV genotype 2a.Effects of structural analogs on replication of HCV genotypes 1b and 2a HCV RNA were studied by quantitative reverse transcriptase polymerase chain reaction.Results:In replicon BB7 cells,a constitutive replication model,HCV RNA levels decreased to 55%,52%,53%,and 54% after transfection with expression plasmids generating RNA structural analogs 5B-46,X-94,X-12,and X-12c,respectively (p<0.001 for all).In an HCV genotype 2a infection model,RNA analogs 5B-46,X-94,and X-12 in hepatic cells inhibited replication to 11%,9%,and 12%,respectively.Because the X-12 analog was only 12% identical to the corresponding sequence of HCV genotype 2a,the sequence per se,or antisense effects were unlikely to be involved.Conclusions:The data suggest that conformation of secondary structures in 3'-UTR of HCV RNA genome is required for HCV replication.Stable expression of RNA analogs predicted to have identical stem-loop structures might inhibit HCV infection of hepatocytes in liver and may represent a novel approach to design anti-HCV agents.
基金supported in part by NSF MRI 0821263NIH BISTI R01GM072080-01A1 grant+1 种基金NIH ARRA Administrative Supplement to NIH BISTI R01GM072080-01A1NSF IIS grant of award No 0916250
文摘RNA secondary structure has become the most exploitable feature for ab initio detection of non-coding RNA(nc RNA) genes from genome sequences. Previous work has used Minimum Free Energy(MFE) based methods developed to identify nc RNAs by measuring sequence fold stability and certainty. However, these methods yielded variable performances across different nc RNA species. Designing novel reliable structural measures will help to develop effective nc RNA gene finding tools. This paper introduces a new RNA structural measure based on a novel RNA secondary structure ensemble constrained by characteristics of native RNA tertiary structures. The new method makes it possible to achieve a performance leap from the previous structure-based methods. Test results on standard nc RNA datasets(benchmarks) demonstrate that this method can effectively separate most nc RNAs families from genome backgrounds.