期刊文献+

基于支持向量机分类的RNA共同二级结构预测 被引量:1

RNA Secondary Structure Prediction Based on Support Vector Machine Classification
下载PDF
导出
摘要 比较序列分析作为RNA二级结构预测的最可靠途径,已经发展出许多算法。将基于此方法的结构预测视为一个二值分类问题:根据序列比对给出的可用信息,判断比对中任意两列能否构成碱基对。分类器采用支持向量机方法,特征向量包括共变信息、热力学信息和碱基互补比例。考虑到共变信息对序列相似性的要求,通过引入一个序列相似度影响因子,来调整不同序列相似度情况下共变信息和热力学信息对预测过程的影响,提高了预测精度。通过49组Rfam-seed比对的验证,显示了该方法的有效性,算法的预测精度优于多数同类算法,并且可以预测简单的假节。 The comparative sequence analysis is the most reliable method for RNA secondary structure prediction, and many algorithms based on it have been developed in last several decades. This paper considers RNA structure prediction as a 2-classes classification problem: given a sequence alignment, to decide whether or not two columns of alignment form a base pair. We employed Support Vector Machine(SVM) to predict potential paired sites, and selected co-variation information, thermodynamic information and the fraction of complementary bases as feature vectors. Considering the effect of sequence similarity upon co-variation score, we introduced a similarity weight factor, which could adjust the contribution of co-variation and thermodynamic information toward prediction according to sequence similarity. The test on 49 Rfam-seed alignments showed the effectiveness of our method, and the accuracy was better than many similar algorithms. Furthermore, this method could predict simple pseudoknot.
出处 《生物工程学报》 CAS CSCD 北大核心 2008年第7期1140-1148,共9页 Chinese Journal of Biotechnology
关键词 比较序列分析 RNA二级结构 支持向量机 相似性影响因子 comparative sequences analysis, RNA secondary structure, support vector machine, similarity weight factor
  • 相关文献

参考文献51

  • 1Rivas E, Eddy SR. A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of Molecular Biology, 1999, 285(5): 2053-2068.
  • 2Zuker M. Calculating nucleic acid secondary structure. Current Opinion in Structural Biology, 2000, 10(3):303-310.
  • 3Horesh Y, Doniger T, Michaeli S, et al. RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules. BMC Bioinformatics, 2007, 8: 366.
  • 4Sakakibara Y, Brown M, Hughey R, et al. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research, 1994, 22(23): 5112-5120.
  • 5Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history, Bioinformatics, 1999, 15(6): 446-454.
  • 6Searls DB. Linguistic approaches to biological sequences. Computer Applications in the Biosciences, 1997, 13(4): 333-344.
  • 7Ding Y, Lawrence CE. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Research, 2003, 31(24): 7280-7301.
  • 8James BD, Olsen GJ, Pace NR. Phylogenetic comparative analysis of RNA secondary structure. Methods Enzymol, 1989, 180: 227-239.
  • 9Winker S, Overbeek R, Woese CR, et al. Structure detection through automated covariance search. Computer Applications in the Biosciences, 1990, 6(4): 365-371.
  • 10Eddy SR, Durbin R, RNA sequence analysis using covariance models, Nucleic Acids Research, 1994, 22(11): 2079-2088.

同被引文献1

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部