摘要
针对支持向量分类机对大规模数据集训练速度慢的瓶颈,提出一种聚簇消减数据集方法。首先建立样本中心距离函数,计算聚簇集的比例半径,然后利用聚簇集镜像扫描样本点确定簇集类,同一类样本特性的聚簇集中只保留代表样本点,建立异类点删除矩阵,通过上述方法消减样本集。证明了这种簇消减算法有较低的时间复杂度,并利用实验说明了保留代表点的有效意义。最后通过随机数据和UCI标准数据库验证了算法在保证分类精度的同时提高了分类速度。
A cluster Support Vector Machines (C-SVM) method for large-scale data set classification was presented to accelerate speed. Firstly, using function of centre distance calculated radius ratio. Then, data set was scanned by cluster mirror. By remaining representative data for cluster and installing deleted matrix sample set was remarkably reduced. It is proved that the new method has lower time complexity. Experiments with random data and UCI databases verify the efficiency of the C-SVM. Moreover, classification accuracy is gained at adjustment threshold value.
出处
《计算机科学》
CSCD
北大核心
2009年第3期184-188,共5页
Computer Science
基金
国家自然科学基金(编号:10501009和10661005)
桂电软环境项目和安徽财经大学青年基金资助