期刊文献+

基于权值多样性的半监督分类算法 被引量:2

Semi-supervised classification algorithm based on weight diversity
下载PDF
导出
摘要 在实际生活中,可以很容易地获得大量系统数据样本,却只能获得很小一部分的准确标签。为了获得更好的分类学习模型,引入半监督学习的处理方式,对基于未标注数据强化集成多样性(UDEED)算法进行改进,提出了UDEED+——一种基于权值多样性的半监督分类算法。UDEED+主要的思路是在基学习器对未标注数据的预测分歧的基础上提出权值多样性损失,通过引入基学习器权值的余弦相似度来表示基学习器之间的分歧,并且从损失函数的不同角度充分扩展模型的多样性,使用未标注数据在模型训练过程中鼓励集成学习器的多样性的表示,以此达到提升分类学习模型性能和泛化性的目的。在8个UCI公开数据集上,与UDEED算法、S4VM(Safe Semi-Supervised Support Vector Machine)和SSWL(Semi-Supervised Weak-Label)半监督算法进行了对比,相较于UDEED算法,UDEED+在正确率和F1分数上分别提升了1.4个百分点和1.1个百分点;相较于S4VM,UDEED+在正确率和F1分数上分别提升了1.3个百分点和3.1个百分点;相较于SSWL,UDEED+在正确率和F1分数上分别提升了0.7个百分点和1.5个百分点。实验结果表明,权值多样性的提升可以改善UDEED+算法的分类性能,验证了其对所提算法UDEED+的分类性能提升的正向效果。 In real life,many data samples of systems can be easily obtained,but only a small part of accurate laabels can be obtained.In order to obtain a better classification learning model,a semi-supervised classification algorithm based on weight diversity was proposed by introducing semi-supervised learning and improving Unlabeled Data to Enhance Ensemble Diversity(UDEED),namely UDEED+.In UDEED+,based on the prediction disagreement of unlabeled data by base learners,the loss of weight diversity was proposed.The disagreement between base learners was represented by the cosine similarity of the weights of base learners.The diversity of model was fully expanded from different perspectives of loss function,and the unlabeled data were used to encourage the diversity representation of ensemble learners in the process of model training,so as to improve the performance and generalization of the classification learning model.Comparative experiments were conducted on 8 UCI public datasets with semi-supervised algorithms of UDEED algorithm,Safe Semi-Supervised Support Vector Machine(S4VM)and Semi-Supervised Weak-Label(SSWL).Compared with UDEED,UDEED+has the accuracy and F1 score improved by 1.4 percentage points and 1.1 percentage points respectively;compared with S4VM,UDEED+has the accuracy and F1 score improved by 1.3 percentage points and 3.1 percentage points respectively;compared with UDEED,UDEED+has the accuracy and F1 score improved by 0.7 percentage points and 1.5 percentage points respectively.Experimental results illustrate that the increase of weight diversity can improve the classification performance of the model,verifying its positive effect on the improvement of the classification performance of UDEED+.
作者 毛铭泽 曹芮浩 闫春钢 MAO Mingze;CAO Ruihao;YAN Chungang(College of Electronic and Information Engineering,Tongji University,Shanghai 201804,China)
出处 《计算机应用》 CSCD 北大核心 2021年第9期2473-2480,共8页 journal of Computer Applications
基金 国家重点研发计划项目(2017YFB1001804)。
关键词 分类机器学习 未标注数据 半监督学习 集成学习 多样性 classification machine learning unlabeled data semi-supervised learning ensemble learning diversity
  • 相关文献

参考文献3

二级参考文献58

  • 1周志华.Multi-Instance Learning from Supervised View[J].Journal of Computer Science & Technology,2006,21(5):800-809. 被引量:12
  • 2Chapelle O,Scholkopf B,Zien A. Semi-Supervised Learning[M].Cambridge,ma:the Mit Press,2006.
  • 3Zhu X J. Semi-supervised Learning Literature Survey.Technical Report 1530[R].Department of Computer Sciences,University of Wisconsin at Madison,Madison,WI,2006.
  • 4Zhou Z H,Li M. Semi-supervised learning by disagreement[J].Knowledge and Information Systems,2010,(03):415-439.
  • 5Shahshahani B M,Landgrebe D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J].IEEE Transactions on Geoscience and Remote Sensing,1994,(05):1087-1095.
  • 6Miller D,Uyar H. A mixture of experts classifier with learning based on both labelled and unlabelled data[A].Cambridge,ma:the Mit Press,1997.571-577.
  • 7Nigam K,McCallum A K,Thrun S,Mitchell T. Text classification from labeled and unlabeled documents using EM[J].Machine Learning,2000,(2-3):103-134.
  • 8Blum A,Mitchell T. Combining labeled and unlabeled data with co-training[A].New York,USA:ACM,1998.92-100.
  • 9Joachims T. Transductive inference for text classification using support vector machines[A].San Francisco,CA,USA,Morgan Kaufmann Publishers Inc,1999.200-209.
  • 10Zhu X J,Ghahramani Z,Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions[A].Menlo Park,ca:aaai Press,2003.912-919.

共引文献183

同被引文献10

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部