摘要
建立在统计学习理论基础上的支持向量机(SVM)具有良好的分类性能,然而其训练的计算量大的弊端,严重限制了其在大规模数据分类方面的应用。针对大规模数据训练的问题,提出一种模糊核聚类支持向量分类方法,该方法将核函数技术与模糊聚类技术相结合,首先滤除一部分冗余的样本点,减少了进入SVM训练过程的样本数,大大提高了SVM的训练效率,使其能够适应处理大规模高维数据的分类。最后利用UCI数据库中的数据对本文的方法进行了评估,实验结果表明通过模糊核聚类技术,能够过滤掉大量的冗余样本点,并且利用过滤后的样本所训练的SVM的精度不低于利用所有样本训练的SVM的精度。
Support Vector Machine (SVM) is one of the state-of-the-art classification techniques. However, the burden of computational complexity in training limits its application in the field of large-scale data classification seriously. A support vector classification technique based on fuzzy kernel clustering was presented for large-scale classification. It combines kernel technology and fuzzy clustering technique to filter out part of redundant sample points, which can reduce the number of the training samples. This method greatly improves the efficiency of training SVM, and make SVM adapt to large-scale high- dimensional data classification. Four types of high-dimensional data in UCI database were used to evaluate the method presented in this paper. The experimental results show that a large number of samples points can be filtered out by using fuzzy kernel clustering techniques, and the precision of SVM trained by using the filtered data samples is not lower than that trained by using all the samples.
出处
《计算机应用》
CSCD
北大核心
2013年第A02期108-110,132,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(61202078)
关键词
模糊聚类
核函数
数据分类
支持向量机
fuzzy clustering
kernel function
data classification
Support Vector Machine (SVM)