摘要
Tri-Training是半监督协同训练的代表性算法之一,它运用统计技术标记置信度,并结合噪音学习理论进行无标记样本分类。当扩充样本训练集不满足噪音学习理论时,会进行随机采样,针对传统Tri-Training算法随机选取基础分类器的扩充训练样本集会引入噪声这一缺陷,通过更改扩充样本训练集选取方式,剔除可能提高分类误差的样本。在健康大数据集上进行一系列验证试验,实验结果表明,改进的算法优于原始算法,降低分类错误率。
Tri-Training is a represented algorithm for semi-supervised co-training, it uses statistical techniques to mark the confidence and combine the noise learning theory to classify the unmarked sample. When the extended sample training set does not satisfy the noise learning theory, random sampling is performed. In order to solve the disadvantages of the traditional Tri-Training algorithm which may introduce noise when select the extended sample training set, changes the method of selecting the extended training set, and removes the sample which would improve the classification errors. Carries out a series of verification experiments on the health data set. The experimental results show that the improved algorithm is superior to the original algorithm and reduces the error probability of classification.
基金
广东省省级科技计划项目(No.2014A090906004)