期刊文献+

基于数据增强和相似伪标签的半监督文本分类算法 被引量:5

Semi-supervised text classification algorithm with data augmentation and similar pseudo-labels
下载PDF
导出
摘要 为了减少对有标记数据的依赖,充分利用大量无标记数据,提出了一个基于数据增强和相似伪标签的半监督文本分类算法(semi-supervised text classification algorithm with data augmentation and similar pseudo-labels, STAP)。该算法利用EPiDA(easy plug-in data augmentation)框架和自训练对少量有标记数据进行扩充,采用一致性训练和相似伪标签考虑无标记数据及其增强样本之间的关系和高置信度的相似无标记数据之间的关系,在有监督交叉熵损失、无监督一致性损失和无监督配对损失的约束下,提高无标记数据的质量。在四个文本分类数据集上进行实验,与其他经典的文本分类算法相比,STAP算法有明显的改进效果。 In order to reduce the dependence on labeled data and make full use of a large number of unlabeled data,this paper proposed the STAP(semi-supervised text classification algorithm with data augmentation and similar pseudo-labels).The algorithm used EPiDA(easy plug-in data augmentation)framework and self-training to expand a small amount of labeled data.It used consistency training and similar pseudo-labels to consider the relationship between unlabeled data and its expanded samples and the relationship between similar unlabeled data with high confidence.Under the constraint of supervised cross entropy loss,unsupervised consistency loss and unsupervised pair loss,it improved the quality of unlabeled data.Experiments on four text classification datasets show that STAP algorithm has obvious improvement over other classical text classification algorithms.
作者 盛晓辉 沈海龙 Sheng Xiaohui;Shen Hailong(School of Science,Northeastern University,Shenyang 110819,China)
机构地区 东北大学理学院
出处 《计算机应用研究》 CSCD 北大核心 2023年第4期1019-1023,1051,共6页 Application Research of Computers
关键词 半监督学习 文本分类 数据增强 相似伪标签 semi-supervised learning text classification data augmentation similar pseudo-label
  • 相关文献

参考文献2

二级参考文献63

  • 1杨剑,王珏,钟宁.流形上的Laplacian半监督回归[J].计算机研究与发展,2007,44(7):1121-1127. 被引量:15
  • 2Chapelle O,Scholkopf B,Zien A. Semi-Supervised Learning[M].Cambridge,ma:the Mit Press,2006.
  • 3Zhu X J. Semi-supervised Learning Literature Survey.Technical Report 1530[R].Department of Computer Sciences,University of Wisconsin at Madison,Madison,WI,2006.
  • 4Zhou Z H,Li M. Semi-supervised learning by disagreement[J].Knowledge and Information Systems,2010,(03):415-439.
  • 5Shahshahani B M,Landgrebe D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J].IEEE Transactions on Geoscience and Remote Sensing,1994,(05):1087-1095.
  • 6Miller D,Uyar H. A mixture of experts classifier with learning based on both labelled and unlabelled data[A].Cambridge,ma:the Mit Press,1997.571-577.
  • 7Nigam K,McCallum A K,Thrun S,Mitchell T. Text classification from labeled and unlabeled documents using EM[J].Machine Learning,2000,(2-3):103-134.
  • 8Blum A,Mitchell T. Combining labeled and unlabeled data with co-training[A].New York,USA:ACM,1998.92-100.
  • 9Joachims T. Transductive inference for text classification using support vector machines[A].San Francisco,CA,USA,Morgan Kaufmann Publishers Inc,1999.200-209.
  • 10Zhu X J,Ghahramani Z,Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions[A].Menlo Park,ca:aaai Press,2003.912-919.

共引文献204

同被引文献32

引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部