摘要
当今经济和社会不断发展,管理和保护唐卡信息越来越重要。为更方便地管理保护唐卡文本信息,需要进行唐卡领域文本分类。对于唐卡领域文本分类任务,首先提出使用BERT进行编码获得语句的上下文特征信息,再使用卷积神经网络提取语义的局部特征,最终通过全连接层进行分类。通过在唐卡领域文本数据集上进行实验,F1值达到90.54%,比TextCNN模型高出3.22%,比BERT模型高出1.99%。实验结果证明了BERT-CNN对于唐卡文本分类的有效性。
With the continuous development of economy and society, it becomes more and more important to manage and protect Thangka information. In order to manage and protect Thangka text information more conveniently, text classification in Thangka field is necessary. For text classification tasks in the Thangka domain, this, first proposes to use BERT to encode the contextual feature information of the sentence, then use the convolutional neural network to extract the semantic local features, and finally classify through the fully connected layer. Through experiments on the Thangka domain text data set, the F1 value reached 90.54%, which is 2.33% higher than the TextCNN model and 1.99% higher than the BERT model. The experimental results prove the effectiveness of BERT;NN in Thangka text classification.
作者
王昱
Wang Yu(School of Mathematics and Computer Science,Northwest University for Nationalities,Lanzhou 730030)
出处
《现代计算机》
2021年第33期99-104,共6页
Modern Computer
关键词
预训练模型
双向长短期记忆网络
卷积神经网络
self attention mechanism
bidirectional long-term and short-term memory network
convolutional neural network