摘要
为了提高机器学习在大数据集中的学习性能,提出了一种基于局部敏感Hash的半监督支持向量机增量学习算法.首先利用局部敏感Hash能快速查找相似数据特性的能力,筛选出第一次增量中与有标签样本相似的样本,通过TSVM(Transductive support vector machine)得到支持向量并筛选出再次增量中有可能成为支持向量的无标记样本,然后与已有支持向量和有标签样本一起作为后续训练的基础,最后使用多个数据集对算法进行验证.实验表明:提出的半监督TSVM增量学习算法能有效地提高训练学习的速度和分类准确率.
In order to improve the learning performance of machine learning in large data sets,we propose a new incremental learning algorithm of TSVM(Transductive support vector machine)with LSH(Locality sensitive Hashing).It uses the LSH algorithm,which can seek similar data fastly,to select the samples which are similar to the labeled samples from the first incremental samples.Then it takes the selected samples and the existing SVs(Support vectors)which are selected from last TSVM training as a basis for the following training.We take advantage of the multiple data sets to validate the algorithm.Experiments show that the new algorithm can effectively improve the speed of the incremental training learning and classification accuracy.
出处
《浙江工业大学学报》
CAS
北大核心
2018年第2期127-131,共5页
Journal of Zhejiang University of Technology
基金
浙江省自然科学基金资助项目(LZ14F030001)