摘要
SVM是Vapnik等人在统计学习理论基础上针对线性分类器提出的一种最佳分类准则,被广泛应用于文本、图像、语音等多个领域的分类问题。LIBSVM、LIBLINEAR、SVMmulticlass是基于支持向量机(SVM)原理集成的两类或多类分类器工具包,这三种工具均实现了对数据的最优化分类,但彼此之间也有各自的特点。对于不同规模的数据集,即样本数与特征数比例不同的数据集的分类结果会存在差异。因此,本文从训练时间(Training Time),分类准确率(Precision)和采用的线性核函数(Kernel Function)这三个方面对各个工具包的分类性能进行详细分析,从而给出三种工具的各自的优缺点,以便为使用这三种工具的研究者们提供一些经验支持。实验结果表明,针对线性可分的数据,LIBLINEAR工具包具有训练时间短,分类准确率高的特点,非常适用于大规模数据的分类。
SVM is a linear classifier based on the optimum criterion of statistical learning theory. It is widely used in text, pictures, speech, and other domains. LIBSVM, LIBLINEAR, and SVMmulticlass are three binary or multiclass classifier tools based on the theory of SVM. The three tools all can classify data, but they have different characteristics. To variety of corpus with different scales, their performances vary greatly. Hence, this paper compares the three classifiers on training time, precision, and the employed kernel functions. According the comparison, we concluded the advantages and disadvantages of the three classifiers to give some positive suggestions for user of this domain. Experimental results show that the LIBLINEAR gives best performance compared with the other two classifiers, especially when be used in big data classifications.
出处
《电子技术(上海)》
2015年第6期1-5,共5页
Electronic Technology