期刊文献+

基于BERT和标签混淆的文本分类模型

Text classification model based on BERT and label confusion
下载PDF
导出
摘要 目前,文本分类的研究主要集中在通过优化文本分类器来增强分类性能。然而,标签和文本之间的联系并没有得到很好的利用。尽管BERT对文本特征的处理表现出了非常好的效果,但对文本和标签的特征提取还有一定的提升空间。文中通过结合标签混淆模型(Label Confusion Model,LCM),提出一种基于BERT和LCM的文本分类模型(Model Based on BERT and Label Confusion,BLC),对文本和标签的特征进一步做了处理。充分利用BERT每一层的句向量和最后一层的词向量,结合双向长短时记忆网络(Bi-LSTM)得到文本表示,来替代BERT原始的文本特征表示。标签在进入LCM之前,使用自注意力网络和Bi-LSTM提高标签之间相互依赖关系,从而提高最终的分类性能。在4个文本分类基准数据集上的实验结果证明了所提模型的有效性。 Presently,the predominant focus of research in text classification lies in the optimization of classifiers to enhance classification performance.However,the potential of effectively leveraging the connection between labels and text needs to be fulfilled.Since BERT has demonstrated excellent performance in handling text features,there is still room for improvement in extracting features from both the text and labels.This paper proposes a text classification model,called BLC,a model based on BERT and label confusion,by incorporating a label confusion model(LCM).The BLC model further enhances the feature extraction from both the text and labels.By fully leveraging the sentence vectors from each layer of BERT and the word vectors from the last layer,as well as combined with a bidirectional long short-term memory(Bi-LSTM)network,the text representation is obtained to replace the original text feature representation of BERT.Before being input into the LCM,the labels are enhanced by utilizing self-attention networks and Bi-LSTM to capture the interdependencies among them.As a result,this approach enhances the overall classification performance.The experimental results on four benchmark text classification datasets confirm the effectiveness of the proposed model.
作者 韩博 成卫青 HAN Bo;CHENG Weiqing(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjiang 210023,China)
出处 《南京邮电大学学报(自然科学版)》 北大核心 2024年第3期100-108,共9页 Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金 国家自然科学基金面上项目(62172236) 江苏省研究生教育教学改革课题(JGZZ19_038)资助项目。
关键词 文本分类 BERT 标签混淆模型 双向长短时记忆网络 自注意力网络 text classification BERT label confusion model(LCM) bidirectional long short-term memory(Bi-LSTM) self-attention network
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部