摘要
DNA序列功能位点的识别是目前生物信息学领域的一个研究热点,剪接位点的识别就是其中之一.为了充分利用剪接位点的特征模式,从而更好地识别剪接位点,建立了一个基于改进Winnow算法的剪接位点识别系统.与其他方法相比较,改进的Winnow算法具有更好的鲁棒性,适用于高维特征空间,能够融合多种模式信息,即使在包含很多不相关特征的情况下,也能有很好的性能.同时在训练的时候,对特征集进行了剪枝,把一些对识别几乎没有贡献的特征去除,这样做对结果的影响可以忽略,而且提高了算法的效率.通过实验验证,改进的Winnow算法可以很好地识别剪接位点,其多个性能指标达到或超过目前国际上流行的剪接位点识别软件.
Identification of the functional sites in DNA sequences is a hotspot of the bioinformatics field, so is the identification of splice sites. To make full use of the information of sequences and increase recognition accuracy, a splice identification system based on improved Winnow algorithm is presented. Compared with the other algorithms, improved Winnow algorithm behaves more robustly. It can tolerate high dimensional feature spaces, and behaves well even in the presence of irrelevant features. At the same time, pruning method is used to abstain some very weak features. By doing this, both a high significant speedup and negligible loss in performance are achieved. Compared with other splice sites identification softwares, improved Winnow algorithm is comparable to the best predictors, and considerably better than most systems.
出处
《生命科学研究》
CAS
CSCD
2005年第3期218-226,共9页
Life Science Research
基金
国家自然科学基金资助项目(60471003)
关键词
剪接位点识别
改进Winnow算法
信息融合
乘法权更新方法
特征分析
splice site identification
improved Winnow algorithm
information fusion
multiplicative weights-update
feature analysis