期刊文献+

基于BERT的唐卡文本分类研究

Research on Text Classification in Thangka Domain Based on Bert
下载PDF
导出
摘要 当今经济和社会不断发展,管理和保护唐卡信息越来越重要。为更方便地管理保护唐卡文本信息,需要进行唐卡领域文本分类。对于唐卡领域文本分类任务,首先提出使用BERT进行编码获得语句的上下文特征信息,再使用卷积神经网络提取语义的局部特征,最终通过全连接层进行分类。通过在唐卡领域文本数据集上进行实验,F1值达到90.54%,比TextCNN模型高出3.22%,比BERT模型高出1.99%。实验结果证明了BERT-CNN对于唐卡文本分类的有效性。 With the continuous development of economy and society, it becomes more and more important to manage and protect Thangka information. In order to manage and protect Thangka text information more conveniently, text classification in Thangka field is necessary. For text classification tasks in the Thangka domain, this, first proposes to use BERT to encode the contextual feature information of the sentence, then use the convolutional neural network to extract the semantic local features, and finally classify through the fully connected layer. Through experiments on the Thangka domain text data set, the F1 value reached 90.54%, which is 2.33% higher than the TextCNN model and 1.99% higher than the BERT model. The experimental results prove the effectiveness of BERT;NN in Thangka text classification.
作者 王昱 Wang Yu(School of Mathematics and Computer Science,Northwest University for Nationalities,Lanzhou 730030)
出处 《现代计算机》 2021年第33期99-104,共6页 Modern Computer
关键词 预训练模型 双向长短期记忆网络 卷积神经网络 self attention mechanism bidirectional long-term and short-term memory network convolutional neural network
  • 相关文献

参考文献3

二级参考文献7

共引文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部