摘要
针对有标记的训练样本数量较少会降低印刷套准识别模型性能的问题,本研究提出了基于安全样本过采样预处理和协同训练的半监督方法,以提升识别模型的性能。首先采用k近邻方法识别训练集中的安全样本。在安全样本间进行过采样,生成新的训练集。然后采用Bootstrap采样方法将新的训练集分成三个子训练集,学习得到三个决策树子分类模型,不断对无标记样本进行预测,并将其加入到子训练集,更新子分类模型,直至其能稳定为止。集成子分类模型,形成最终分类模型。实验结果表明,本研究方法随着训练样本数量的增多,分类性能也逐渐提高。当训练样本数量为800时,其在测试集上的分类准确率Accuracy达到98%,召回率的几何平均数G-mean为99%,在同样数量的训练样本上,均高于实验中的其他方法。本研究方法可以有效利用无标记样本,提高印刷套准识别模型的性能,实现数量较少的训练集样本的印刷套准识别。
A small number of labeled samples are utilized to train models for identifying printing registration,which degrades severely the model performance.To solve this problem,in this study,a novel method was proposed with the combination of an oversampling pretreatment of safe samples and a co-training semi-supervised method.Firstly,k-nearest neighbor method was used to identify safe samples in the training set.An oversampling operation was implemented to generate new synthetic samples among the safe samples.A new training set was generated by combining the original training set and new synthetic samples.The new training set was divided into three training subsets with Bootstrap sampling method.Decision trees as base classifiers were trained from the distribution of three training subsets,respectively.Unlabeled samples were continuously predicted and incorporated into the training subsets,which updates the performance of base classifiers.The process was terminated until the performance was stable.Three base classifiers were integrated into the final classification model for the printing registration recognition.The experimental results showed that the classification performance of the proposed method is gradually improved with the increasing number of training samples.When the number of training samples reaches 800,the proposed method achieves the best classification accuracy(Accuracy)and the geometry mean(G-mean)of recalls of samples on the test set.They are 98%and 99%,respectively,which are better than those achieved with other methods in the experiment.The proposed method can effectively exploit the distribution of unlabeled samples to improve the model performance,and realize printing registration recognition with a small number of training samples.
作者
陈伟
简川霞
CHEN Wei;JIAN Chuan-xia(School of Art,Ningbo City College of Vocational Technology,Ningbo 315100,China;College of Electromechanical Engineering,Guangdong University of Technology,Guangzhou 510006,China)
出处
《数字印刷》
CAS
北大核心
2022年第2期52-60,共9页
Digital Printing
基金
浙江省教育厅科研项目资助(No.Y202147591)
广东省信息物理融合系统重点实验室项目(No.2016B030301008)
广东工业大学青年基金重点项目(No.17QNZD001)
大学生创新创业训练项目(No.yj202111845031)。
关键词
协同训练
半监督学习
印刷套准
决策树
Cooperative training
Semi-supervised learning
Printing registration
Decision tree