摘要
为保证分类器间的差异性,同时提高子分类器自身性能,提出一种新的基于边缘样本增长的半监督集成学习方法——边缘协同森林(M-Co-Forest)。当从未标记样本中选取伪标记样本时,不仅考虑未标记样本的标记置信度,同时考虑未标记样本在已标记样本中的位置。只有处于当前分类器训练样本边缘且置信度高于预设阈值的样本才能被赋予伪标签,加入下一轮学习。同时,利用噪音学习理论指导训练过程,当伪标记样本的数量不足以进一步提升分类器性能时,停止迭代。多个UCI数据集和CTG数据上的实验结果表明M-Co-Forest的性能优于对比算法。
In order to ensure the disagreements among sub-classifiers and improve the classifier performance,a new semi-supervised ensemble learning method based on margin samples addition,termed M-Co-Forest,is proposed in this paper.When pseudo-labeled samples are selected from unlabeled samples,both the unlabeled samples' labeling confidence level and the position of unlabeled samples in the labeled samples are considered.Only samples located at the margin of the current classifier and the labeling confidence level above the preset threshold can be labeled and join the next round of training.At the same time,the noise learning theory is introduced to guide the training process.When pseudo-labeled samples size is not enough to further improve the classifier performance,the iteration stops.The experimental results on multiple UCI datasets and CTG data show that M-Co-Forest outperforms the comparison algorithms.
作者
刘紫阳
高占宝
李绪隆
Liu Ziyang;Gao Zhanbao;Li Xulong(School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China)
出处
《仪器仪表学报》
EI
CAS
CSCD
北大核心
2018年第3期45-53,共9页
Chinese Journal of Scientific Instrument
关键词
半监督学习
协同训练
集成学习
边缘样本
semi-supervised learning
Co-training
ensemble learning
margin sample