摘要
为提高剪接位点识别的精度,提出一种基于综合信息的剪接位点识别方法.通过分析供体位点与受体位点的剪接信号、剪接序列、位点附近序列的二级结构,以及剪接因子作用过程等特征,分别为供体位点与受体位点建立信号模型和序列模型;应用Vienna软件中的Mfold包预测每个剪接位点附近序列最稳定的二级结构,将传统的四字符核酸表转化为八字符核酸表,每个序列用八字符进行描述,用结合了结构信息的序列对信号模型和序列模型进行训练学习;最后用训练好的模型进行剪接位点的识别.实验结果证明:该方法对剪接位点的识别取得了很好的效果,其识别精度可达95%以上.
To identify splice sites more accurately and efficiently, a method to recognize splice sites based on comprehensive information was proposed. By analyzing the splicing signals, splicing sequences, secondary structures of flank sequence, different splicing factor mechanism of action and other characteristics of donor sites and acceptor sites, donor sites identification signal model, acceptor sites identification signal model, donor sites identification sequence model and acceptor sites identification sequence model were built, respectively. Then the Mfold package in Vienna soft was used to predict the most stable secondary structure of flank sequences. The traditional four-letter alphabet was converted into eight-letter alphabet sequence. The sequence-structure combination strings were used for training signal models and sequence models, and then well trained models were applied to recognize splice sites. Results show that the accuracy of splice site recognition is beyond 95%, suggesting that the method has great potential to achieve a good performance for splice sites identification.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2011年第3期111-114,共4页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(61071174)
国家高技术研究发展计划资助项目(2008AA01Z148)
黑龙江省杰出青年科学基金资助项目(JC200703)
关键词
生物信息学
剪接位点
剪接信号
可变剪接
二级结构
bioinformatics; splice sites; splice signal; alternative splice; secondary structures;