摘要
很多情况下,研究者掌握了一些分类数据的生成信息,这些信息能够为核函数提供有价值的分类特征。已有大量结合生成模型构造核函数的研究,边际核是其中较新的研究成果。以边际核理论为基础,在边际核特征空间中引入特征向量之间的距离作为相似性的量度,构造了基于距离量度的边际核函数。随后将它和原边际核均应用于具体的(旋转酶B亚单位)氨基酸序列分类实验中,实验结果表明:基于距离量度的边际核拥有比原边际核更佳的识别效率,且也具备一定的推广能力。
In most cases, people know something about the probability distribution of data which needed to be classified. These information provide valuable characteristics for kernel function. Several works are done to derive kernels from the generated models, e.g., the marginalized kernel. On the basis ofmarginalized methods, a new reasonable way of designing a kernel, using the distance between different characteristic vectors as the measure of similarity, is proposed in the kernel space. Then the new kernel and the marginalized kernel are both used to classify bacterial gyrase subunit B amino acid sequences. Experimental results demonstrate that the new kernel embraces better recognition accurateness than the marginalized kernel. And it holds strong generalization capability, too.
出处
《计算机工程与设计》
CSCD
北大核心
2007年第14期3501-3503,3507,共4页
Computer Engineering and Design
关键词
核设计
边际核
核特征空间
生物序列分类
隐马尔可夫模型
欧氏距离
kernel design
marginalized kernels
kernel space
biological sequence classification
HMM
euclidean distance