摘要
在实际的分类任务中,无标记样本数量充足而有标记样本数量稀少的情况经常出现,目前处理这种情况的常用方法是半监督自训练分类算法。提出了一种基于数据密度的半监督自训练分类算法,该算法首先依据数据的密度对数据集进行划分,从而确定数据的空间结构;然后再按照数据的空间结构对分类器进行自训练的迭代,最终得到一个新的分类器。在UCI中六个数据集上的实验结果表明,与三种监督学习算法以及其分别对应的自训练版本相比,提出的算法分类效果更好。
It is a common problem in many practical applications that unlabeled samples is sufficient but labeled ones is very rare.A successful method to tackle this problem is self-training semi-supervised classification.This paper introduced a self-training semi-supervised classification method,in which entire data was divided into three parts based on density of data,so that the real structure of data space could be found.And then,it proposed a framework for self-training semi-supervised classification,in which the structure of data space was integrated into the self-training iterative process to help train a better classifier.Experiments on 6 data sets from UCI show that the classifier gets from the proposed method has a better performance than the ones gets from supervised method with few labeled samples and standard self-training semi-supervised classification method.
作者
艾震鹏
王振友
Ai Zhenpeng;Wang Zhenyou(College of Applied Mathematics,Guangdong University of Technology,Guangzhou 510520,China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第4期1072-1074,共3页
Application Research of Computers
基金
广州市科技计划资助项目(201707010435)
广东省研究生教育创新改革项目(2014JGXM-MS17)
关键词
半监督学习
自训练
密度
分类
semi-supervised
self-training
density
classification