摘要
【目的】通过引入多任务学习模型和数据增强方法,解决突发公共卫生事件情景下谣言识别任务数据不平衡且带标签数据量少的问题。【方法】首先提取突发公共卫生事件谣言文本特征构建替换词表,基于扩展同义词表构建CEDA方法对不平衡的谣言数据集进行增强,然后构建多任务学习模型融合突发公共卫生事件情感分类和谣言识别任务的领域信息,基于Transformer获取共享特征,通过BiLSTM模型获取谣言识别任务的独有特征,提升突发公共卫生事件谣言识别任务准确性。【结果】本文所提多任务学习模型的F1值达到0.972,比基于不平衡数据集的模型和单任务学习模型分别高出0.006和0.007,与DC-CNN模型相比F1值提升0.024。【局限】多任务学习模型的辅助任务仅包括情感二分类任务,需要对负面情感进行更细粒度的分类。【结论】基于领域数据增强和多任务学习的方法能够有效提高突发公共卫生事件谣言识别的分类效果。
[Objective]This paper proposes a new model with data augmentation and multi-task learning,aiming to address the issue of unbalanced data and insufficient labeled data in rumor detection during public health emergencies.[Methods]Firstly,we extracted the text features of public health emergency rumors to construct a replacement word list.Then,we developed the CEDA method based on the extended synonym table to enhance the unbalanced rumor dataset.Third,we built a multi-task learning model to integrate the domain information of public health emergency sentiment classification and rumor detection.Fourth,we obtained the shared features with Transformer and retrieved the unique features of the rumor detection task using the BiLSTM model.Finally,it helped us improve the accuracy of the rumor detection.[Results]The F1 value of the proposed model was 0.972,which was 0.006 and 0.007 higher than the model based on the unbalanced dataset and the single-task learning model.Compared with the DC-CNN model,the F1 value increased by 0.024.[Limitations]The multi-task learning model only includes binary classification of sentiments,requiring more fine-grained negative sentiment classification.[Conclusions]The proposed method can effectively classify public health emergency rumors.
作者
曾子明
张瑜
Zeng Ziming;Zhang Yu(School of Information Management,Wuhan University,Wuhan 430072,China)
出处
《数据分析与知识发现》
EI
CSSCI
CSCD
北大核心
2023年第11期56-67,共12页
Data Analysis and Knowledge Discovery
基金
国家社会科学基金项目(项目编号:21BTQ046)的研究成果之一。