摘要
提出了一种面向大规模数据集的单类支持向量机(OCSVM)方法.该方法基于k近邻思想得到表征数据集合分布特征的集合内点,并依此生成集合边缘点,而后由二者重新组成数据集合,用于OCSVM训练.该新建数据集不仅极大压缩了原有大规模数据集的样本数量,还可以保留原有大规模数据集的分布特征,从而有效解决了OCSVM在处理大规模数据集时所存在的训练时间长、模型复杂以及预测效率低等问题.最后,通过在典型数据集合上进行的对比实验,表明了所提方法的有效性.
A method to train one-class support vector machine(OCSVM) on the large-scale data sets is proposed.The proposed method selects inner points representing the distribution characteristics of the original large-scale data sets based on the principle of k-nearest neighbor,and generates the edge points using the inner points selected.A new data set is formed by combining these tw o kinds of points to train OCSVM.The new data set not only reduces the volume of the original large-scale data set greatly,but also maintains the distribution characteristics of the original data set.Thus the problems faced by OCSVM on the large-scale data sets,such as long training time,complicated models and low predicting speed,are effectively solved.Finally,the experiment is conducted on typical data sets to illustrate the effectiveness of the method proposed.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第A01期206-209,共4页
Journal of Southeast University:Natural Science Edition
基金
国家高技术研究计划(863计划)资助项目(2011AA060203)
国家重点基础研究发展计划(973计划)资助项目(2009CB320602)
关键词
单类支持向量机
大规模数据集
数据集压缩
one-class support vector machine
large data sets
training-set condensation