摘要
支持向量机(Support vector machine,SVM)作为一种经典的分类方法,已经广泛应用于各种领域中。然而,标准支持向量机在分类决策中面临以下问题:(1)未考虑分类数据的分布特征;(2)忽略了样本类别间的相对关系;(3)无法解决大规模分类问题。鉴于此,提出融合数据分布特征的保序学习机(Rank preservation learning machine based on data distribution fusion,RPLM-DDF)。该方法通过引入类内离散度表征数据的分布特征;通过各类样本数据中心位置相对不变保证全局样本顺序不变;通过建立所提方法和核心向量机对偶形式的等价性解决了大规模分类问题。在人工数据集、中小规模数据集和大规模数据集上的比较实验验证所提方法的有效性。
As a typical classification method,support vector machine(SVM)has been widely used in various fields.However,the standard SVM faces the following problems in the classification decision:First,it does not consider the distribution characteristics of the classification data;Second,it ignores the relative relationship between sample categories;Third,it can not solve the problem of large-scale classification.In view of this,the rank preservation learning machine based on data distribution fusion(RPLM-DDF)is proposed,in which within-class scatter is introduced to describe the distribution properties,and through the relatively constant position of all kinds of sample data centers,the global sample order remains unchanged.The large-scale classification problem is solved by certifying RPLMDDF and the duality of the core vector machine.The comparison experiments on the artificial datasets,small-scale datasets and large-scale datasets verity the effectiveness of the RPLM-DDF.
作者
刘忠宝
张志剑
党建飞
LIU Zhongbao;ZHANG Zhijian;DANG Jianfei(School of Software,North University of China,Taiyuan,030051,China)
出处
《数据采集与处理》
CSCD
北大核心
2020年第3期431-440,共10页
Journal of Data Acquisition and Processing
基金
国家社会科学基金(19BTQ012)资助项目。
关键词
类内离散度
支持向量机
大规模数据集
全局保序
核心向量机
within-class scatter
support vector machine(SVM)
large-scale labeled datasets
global rank preservation
core vector machine(CVM)