基于数据增强和多任务学习的突发公共卫生事件谣言识别研究被引量：3

Rumor Detection of Public Health Emergencies Based on Data Augmentation and Multi-Task Learning

导出

摘要【目的】通过引入多任务学习模型和数据增强方法,解决突发公共卫生事件情景下谣言识别任务数据不平衡且带标签数据量少的问题。【方法】首先提取突发公共卫生事件谣言文本特征构建替换词表,基于扩展同义词表构建CEDA方法对不平衡的谣言数据集进行增强,然后构建多任务学习模型融合突发公共卫生事件情感分类和谣言识别任务的领域信息,基于Transformer获取共享特征,通过BiLSTM模型获取谣言识别任务的独有特征,提升突发公共卫生事件谣言识别任务准确性。【结果】本文所提多任务学习模型的F1值达到0.972,比基于不平衡数据集的模型和单任务学习模型分别高出0.006和0.007,与DC-CNN模型相比F1值提升0.024。【局限】多任务学习模型的辅助任务仅包括情感二分类任务,需要对负面情感进行更细粒度的分类。【结论】基于领域数据增强和多任务学习的方法能够有效提高突发公共卫生事件谣言识别的分类效果。 [Objective]This paper proposes a new model with data augmentation and multi-task learning,aiming to address the issue of unbalanced data and insufficient labeled data in rumor detection during public health emergencies.[Methods]Firstly,we extracted the text features of public health emergency rumors to construct a replacement word list.Then,we developed the CEDA method based on the extended synonym table to enhance the unbalanced rumor dataset.Third,we built a multi-task learning model to integrate the domain information of public health emergency sentiment classification and rumor detection.Fourth,we obtained the shared features with Transformer and retrieved the unique features of the rumor detection task using the BiLSTM model.Finally,it helped us improve the accuracy of the rumor detection.[Results]The F1 value of the proposed model was 0.972,which was 0.006 and 0.007 higher than the model based on the unbalanced dataset and the single-task learning model.Compared with the DC-CNN model,the F1 value increased by 0.024.[Limitations]The multi-task learning model only includes binary classification of sentiments,requiring more fine-grained negative sentiment classification.[Conclusions]The proposed method can effectively classify public health emergency rumors.

作者曾子明张瑜 Zeng Ziming;Zhang Yu(School of Information Management,Wuhan University,Wuhan 430072,China)

机构地区武汉大学信息管理学院

出处《数据分析与知识发现》 EI CSSCI CSCD 北大核心 2023年第11期56-67,共12页 Data Analysis and Knowledge Discovery

基金国家社会科学基金项目(项目编号:21BTQ046)的研究成果之一。

关键词突发公共卫生事件谣言识别数据增强多任务学习共享Transformer Public Health Emergencies Rumor Detection Data Augmentation Multi-Task Learning Shared Transformer

分类号 TP393 [自动化与计算机技术—计算机应用技术] G350 [文化科学—情报学]

引文网络
相关文献

参考文献12

1李悦晨,钱玲飞,马静.基于BERT-RCNN模型的微博谣言早期检测研究[J].情报理论与实践,2021,44(7):173-177. 被引量：18
2施国良,陈宇奇.文本增强与预训练语言模型在网络问政留言分类中的集成对比研究[J].图书情报工作,2021,65(13):96-107. 被引量：10
3刘勘,黄哲英.重大突发疫情事件中的谣言识别[J].华南理工大学学报（自然科学版）,2021,49(1):18-28. 被引量：3
4杨晗迅,周德群,马静,罗永聪.基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究[J].数据分析与知识发现,2021,5(7):101-110. 被引量：4
5刘知远,张乐,涂存超,孙茂松.中文社交媒体谣言统计语义分析[J].中国科学：信息科学,2015,45(12):1536-1546. 被引量：46
6苏致中,席耀一,陈宇飞,曹蓉,马洁琼.面向社交媒体立场检测的数据增强方法[J].信息工程大学学报,2022,23(1):58-65. 被引量：1
7贺刚,吕学强,李卓,徐丽萍.微博谣言识别研究[J].图书情报工作,2013,57(23):114-120. 被引量：35
8尹鹏博,潘伟民,彭成,张海军.基于用户特征分析的微博谣言早期检测研究[J].情报杂志,2020,39(7):81-86. 被引量：23
9孙冉,安璐.突发公共卫生事件中谣言识别研究[J].情报资料工作,2021,42(5):42-49. 被引量：17
10石锴文,刘勘.突发公共卫生事件中微博谣言的识别[J].图书情报工作,2021,65(13):87-95. 被引量：15

二级参考文献81

1余本功,曹雨蒙,陈杨楠,杨颖.基于nLD-SVM-RF的短文本分类研究[J].数据分析与知识发现,2020,4(1):111-120. 被引量：10
2夏松,林荣蓉,刘勘.网络谣言敏感词库的构建研究——以新浪微博谣言为例[J].知识管理论坛,2019(5):267-275. 被引量：6
3任一奇,王雅蕾,王国华,冯伟.微博谣言的演化机理研究[J].情报杂志,2012,31(5):50-54. 被引量：40
4胡钰.大众传播效果[M]{H}北京:新华出版社,2000120-121.
5Castillo C,Mendoza M,Poblete B. Information credibility on Twitter[A].New York:ACL,2011.675-684.
6Qazvinian V,Rosengren E,Radev D R. Rumor has it:Identifying misinformation in microblogs[A].Edinburgh:ACL,2011.1589-1599.
7Mendoza M,Pdblete B,Castillo C. Twitter under crisis:Can we trust what we RT[A].New York:ACL,2010.71-79.
8Takahashi T,Igata N. Rumor detection on Twitter[A].Kobe:IEEE,2012.452-457.
9Yang Fan,Liu Y,Yu X. Automatic detection of rumor on Sina Weibo[A].Beijing:ACM,2012.1-7.
10Wang A H. Don't follow me:Spam detection in Twitter[A].Athens:SciThePress,2010.142-151.