摘要
针对传统基因剪接位点识别方法具有所用到的序列长,且参数多的问题,论文提出了一种基于KL距离的变长马尔可夫模型(Kullback Leibler divergence-variable length Markovmodel,KL-VLMM)。该模型在变长马尔可夫模型的基础上进行改进,由KL距离代替原来的概率比值来判断序列扩展的方向,有效地提高了特征序列的识别能力,且模型阶数由二阶降为一阶,降低了算法的空间复杂度。利用人类剪接位点数据库N269,对该模型和其他传统方法的识别性能进行了比较。实验结果表明,采用KL-VLMM方法预测人类基因剪接位点的预测效果更好。
In this paper,a variable length Markov model based on Kullback Leibler divergence(Kullback Leibler divergence-variable length Markov model,KL-VLMM) for human splice sites prediction was proposed to avoid the problem of long sequence and more parameters in traditional methods.In this method,the direction of the extended sequence could be chosen as the detecting features for each candidate splice site,according to the KL divergence,instead of the ratio of likelihood at each position.Furthermore,the order of the model was decreased from second to first.As a result,the capability of prediction of the model was effectively improved,and the space complexity of the prediction algorithm of the model was reduced as well.To test performance of the KL-VLMM method,two experiments were carried with it,and at the same time with some traditional methods,such as VLMM and support vector machine(SVM),using the human splice sites database-N269.The experimental results included prediction accuracy and receiver operating characteristic(ROC) curves.In comparison with the other two methods,the prediction accuracy of the KL-VLMM method was the highest,and the ROC curve of it was above others.These results show effectiveness of the KL-VLMM method.
出处
《生物物理学报》
CAS
CSCD
北大核心
2011年第8期719-726,共8页
Acta Biophysica Sinica
关键词
变长马尔可夫模型
剪接位点识别
KL距离
Variable length Markov model
Splice site recognition
KL divergence