摘要
比较序列分析作为RNA二级结构预测的最可靠途径,已经发展出许多算法。将基于此方法的结构预测视为一个二值分类问题:根据序列比对给出的可用信息,判断比对中任意两列能否构成碱基对。分类器采用支持向量机方法,特征向量包括共变信息、热力学信息和碱基互补比例。考虑到共变信息对序列相似性的要求,通过引入一个序列相似度影响因子,来调整不同序列相似度情况下共变信息和热力学信息对预测过程的影响,提高了预测精度。通过49组Rfam-seed比对的验证,显示了该方法的有效性,算法的预测精度优于多数同类算法,并且可以预测简单的假节。
The comparative sequence analysis is the most reliable method for RNA secondary structure prediction, and many algorithms based on it have been developed in last several decades. This paper considers RNA structure prediction as a 2-classes classification problem: given a sequence alignment, to decide whether or not two columns of alignment form a base pair. We employed Support Vector Machine(SVM) to predict potential paired sites, and selected co-variation information, thermodynamic information and the fraction of complementary bases as feature vectors. Considering the effect of sequence similarity upon co-variation score, we introduced a similarity weight factor, which could adjust the contribution of co-variation and thermodynamic information toward prediction according to sequence similarity. The test on 49 Rfam-seed alignments showed the effectiveness of our method, and the accuracy was better than many similar algorithms. Furthermore, this method could predict simple pseudoknot.
出处
《生物工程学报》
CAS
CSCD
北大核心
2008年第7期1140-1148,共9页
Chinese Journal of Biotechnology
关键词
比较序列分析
RNA二级结构
支持向量机
相似性影响因子
comparative sequences analysis, RNA secondary structure, support vector machine, similarity weight factor