摘要
支持向量机是一种比较新的机器学习方法,它满足结构风险最小的要求,并且能够适用于高维的特征空间,因此在生物序列分析中得到了广泛地应用。结合基因序列的特点,提出了一种新的核函数--位置权重子序列核函数。这个核函数融合了基因序列中子序列的组成特征和位置信息,能够比较充分地体现序列特征。将这个核函数用于基因剪接位点的识别分析,得到的结果表明,采用了位置权重子序列核函数的支持向量机能够很好的识别剪接位点,与其它方法相比,取得了更高的识别精度。
Support vector machine is a relatively new addition to machine learning, which satisfies structural risk minimization and has the ability to deal with a large number of features. It is widely used in analysis of biology sequences. By incorporating the characteristic of gene sequences, a novel kernel, namely position weight subsequences kernel, is proposed. This kernel syncretizes the composition and the position information of subsequences, and can sufficiently express the characteristic of gene sequences. This kernel is used for splice sites identification and the experimental results demonstrated that support vector machine with position weight subsequence kernel can identify splice sites effectively. Compared with other methods, our method achieved better precision.
出处
《计算机仿真》
CSCD
2006年第9期69-71,共3页
Computer Simulation
基金
国家自然科学基金(60471003)
关键词
支持向量机
核函数
生物序列分析
Support vector machine (SVM)
Kernel
Biology sequences analysis