摘要
剪接位点的识别作为基因识别中的一个重要环节, 一直受到研究人员的关注。考虑到剪接位点附近存在的序列保守性,已有一些基于统计特性的方法被用于剪接位点的识别中,但效果仍有待进一步改进。支持向量机(Support Vector Machines) 作为一种新的基于统计学习理论的学习机,近几年有了很大的发展,已被应用在模式识别的许多问题中。文中将其用于剪接位点的识别中,并针对满足GT- AG 规则的序列样本中虚假剪接位点的样本数远大于真实位点这一特性, 提出了一种基于SVM 的平衡取小法以获得更好的识别效果。实验结果表明,应用支持向量机进行剪接位点的识别能更好地提取位点附近保守序列的统计特征,对测试集具有更好的推广能力,并且使用上更加简单。这一结果为剪接位点的识别提供了一种新的方法,同时也为生物大分子研究中结构和位点的识别问题的解决提供了新的线索。
As a novel type of general learning machine based on statistical learning theory, support vector machine (SVM) received much attention in recent years due largely to its excellent performance in some topics of pattern recognition. In this paper, SVM is applied to predict splice sites of DNA sequences. Considering the fact that the number of pseudo splice sites, which satisfy the GT-AG rule but are not real splice sites, is far greater than the number of real splice sites, a new SVM-based prediction approach, which is called Balance-Minimum approach, is proposed in this paper for better prediction performance. Experiments exhibit encouraging results. Comparing with neural network based approaches, the SVM-based approach has better prediction performance for the testing set of potential splice sites and is more easily to use (users do not need to select network structure and initialize a number of parameters). These results also provide a new clue for other functional sites prediction tasks.
出处
《生物物理学报》
CAS
CSCD
北大核心
1999年第4期733-739,共7页
Acta Biophysica Sinica
关键词
基因识别
支持向量机
剪接位点
识别
Support vector machine (SVM) Splice sites Prediction