Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation o...Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation of gene expression. Thus, finding similar structuralfeatures in a set of RNA sequences known to play the same biological function could providesubstantial information concerning which parts of the sequences are responsible for the functionitself. Unfortunately, finding common structural elements in RNA molecules is a very challengingtask, even if limited to secondary structure. The main difficulty lies in the fact that in nearlyall the cases the structure of the molecules is unknown, has to be somehow predicted, and thatsequences with little or no similarity can fold into similar structures. Although they differ insome details, the approaches proposed so far are usually based on the preliminary alignment of thesequences and attempt to predict common structures (either local or global, or for some selectedregions) for the aligned sequences. These methods give good results when sequence and structuresimilarity are very high, but function less well when similarity is limited to small and localelements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we presentdirectly searches for regions of the sequences that can fold into similar structures, where thedegree of similarity can be defined by the user. Any information concerning sequence similarity inthe motifs can be used either as a search constraint, or a posteriori, by post-processing theoutput. The search for the regions sharing structural similarity is implemented with the affix tree,a novel text-indexing structure that significantly accelerates the search for patterns having asymmetric layout, such as those forming stem-loop structures. Tests based on experimentally knownstructures have shown that the algorithm is able to identify functional motifs in the secondarystructure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions offerritin mRNA, and the domain IV stem-loop structure in SRP RNA.展开更多
文摘Several experiments and observations have revealed the fact that small localdistinct structural features in RNA molecules are correlated with their biological function, forexample, in post-transcriptional regulation of gene expression. Thus, finding similar structuralfeatures in a set of RNA sequences known to play the same biological function could providesubstantial information concerning which parts of the sequences are responsible for the functionitself. Unfortunately, finding common structural elements in RNA molecules is a very challengingtask, even if limited to secondary structure. The main difficulty lies in the fact that in nearlyall the cases the structure of the molecules is unknown, has to be somehow predicted, and thatsequences with little or no similarity can fold into similar structures. Although they differ insome details, the approaches proposed so far are usually based on the preliminary alignment of thesequences and attempt to predict common structures (either local or global, or for some selectedregions) for the aligned sequences. These methods give good results when sequence and structuresimilarity are very high, but function less well when similarity is limited to small and localelements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we presentdirectly searches for regions of the sequences that can fold into similar structures, where thedegree of similarity can be defined by the user. Any information concerning sequence similarity inthe motifs can be used either as a search constraint, or a posteriori, by post-processing theoutput. The search for the regions sharing structural similarity is implemented with the affix tree,a novel text-indexing structure that significantly accelerates the search for patterns having asymmetric layout, such as those forming stem-loop structures. Tests based on experimentally knownstructures have shown that the algorithm is able to identify functional motifs in the secondarystructure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions offerritin mRNA, and the domain IV stem-loop structure in SRP RNA.