摘要
为了提高半监督分类的有效性,提出一种基于交叉验证思想的半监督分类方法(CV-S3VM)。通过对未标记样本进行伪标记,将伪标记后的样本加入到标记样本集中,参与交叉验证,选取能使SVM分类器误差最小的标记作为最终的标记,实现对未标记样本进行标记。依次挖掘未标记样本的隐含信息,增加标记样本的数目。使用UCI数据集模拟半监督分类实验环境,结果表明CV-S3VM具有较高的分类率,在标记样本较少的情况下效果更为明显。
In order to improve the performance of semi - supervised classifier, a kind of semi - supervisedclassification algorithm CV - S3VM based on the idea of cross validation was proposed. Unlabeled sampleswere labeled and added to the labeled sample set to participate in cross validation. The labels which makeSVM classifier error minimum were selected as the final lables to mark the unlabeled samples. In this waythe information embedded in the unlabeled samples were mined and the number of labeled samples wasexpanded. Finally, the UCI dataset was used to simulate the semi -supervised classification experimentalenvironment. The results show that CV - S3VM has a higher classification rate. In the case of few labeledsamples, the effect is more obvious.
出处
《西南科技大学学报》
CAS
2014年第1期34-38,48,共6页
Journal of Southwest University of Science and Technology
基金
陕西省教育厅科研计划项目资助(12JK0748)
关键词
机器学习
半监督分类
交叉验证
支持向量机
Machine learning
Semi - supervised classification
Cross validation
Support vector machine