摘要
[目的/意义]政府网络问政平台是政府部门知晓民意的重要途径之一,为提高问政留言分类的精度以及处理留言数据质量差、数量少等问题,对比多种基于BERT改进模型与文本增强技术结合的分类效果并探究其差异原因。[方法/过程]设计网络问政留言分类集成对比模型,文本增强方面采用EDA技术与SimBERT文本增强技术进行对比实验,文本分类模型方面则采用多种基于BERT改进的预训练语言模型(如ALBERT、RoBERTa)进行对比实验。[结果/结论]实验结果表明,基于RoBERTa与SimBERT文本增强的文本分类模型效果最佳,在测试集上的F1值高达92.05%,相比于未进行文本增强的BERT-base模型高出2.89%。同时,SimBERT文本增强后F1值相比未增强前平均提高0.61%。实验证明了基于RoBERTa与SimBERT文本增强模型能够有效提升多类别文本分类的效果,在解决同类问题时具有较强可借鉴性。
[Purpose/significance]Government network platform for political inquiry is one of the important ways for rulers to know public opinions.In order to improve the accuracy of the classification of political inquiry messages and to deal with the problems such as poor quality and small quantity of message data,the classification effects of various BERT improved models combined with text enhancement technology and the reasons for their differences were explored.[Method/process]Design the network political inquiry message classification integrated comparison model,the EDA(Easier Data Augment)technology and SimBERT text Augment technology were used for comparison experiment in the aspect of text augmentation,and various pre-training language models(such as ALBERT and RoBERTa)based on BERT improvement were used for comparison experiment in the aspect of text classification model.[Result/conclusion]The experimental results showed that the text classification model based on RoBERTa and SimBERT text enhancement had the best effect,and the F1 value on the test set was as high as 92.05%,2.89%higher than that of the Bert-Base model without text enhancement.At the same time,F1 value after SimBERT text enhancement was 0.61%higher than that before no enhancement.The experiment proved that text enhancement model based on RoBERTa and SimBERT can effectively improve the classification effect of multiple categories of text classification problems,and has strong referability in solving similar problems.
作者
施国良
陈宇奇
Shi Guoliang;Chen Yuqi(Business School,Hohai University,Nanjing 211100)
出处
《图书情报工作》
CSSCI
北大核心
2021年第13期96-107,共12页
Library and Information Service
基金
中央高校基本业务费项目"基于图数据库的水利知识图谱关键技术研究"(项目编号:B200207036)研究成果之一。
关键词
问政平台
文本分类
文本增强
BERT模型
political platform
text classification
text enhancement
BERT model