摘要
针对现有语言舆情领域缺乏研究数据集的问题,通过构建语言舆情信息源库,确立了语言舆情的信息来源和范围,并对其中包含的微博数据进行采集,以构建多模态语言舆情数据集。进一步提出了一种基于多模态融合的语言舆情识别方法,通过注意力机制增强单模态特征,并学习不同模态特征之间的依赖关系,以生成细粒度的多模态表示。实验结果表明,本文方法在准确率上优于现有的多模态分类方法,可有效识别出语言舆情信息。
In response to the lack of research datasets in the field of language public opinion,the information sources and scope of language public opinion were determined by constructing language public opinion information source library,and a multimodal language public opinion dataset was built by collecting Weibo data contained therein.Furthermore,a language public opinion recognition method based on multimodal fusion was proposed.Unimodal features were enhanced through attention mechanisms,and the dependency relationships between different modal features were learned to generate fine-grained multimodal representations.The experimental results show that the proposed method outperforms existing multimodal classification methods in terms of accuracy and can identify language public opinion information effectively.
作者
吕学强
董良
滕尚志
张乐
Lü Xueqiang;DONG Liang;TENG Shangzhi;ZHANG Le(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China)
出处
《北京信息科技大学学报(自然科学版)》
2023年第5期1-9,共9页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金资助项目(62202061,62171043)
北京市自然科学基金项目(4232025)
国家语言文字工作委员会科研项目(ZDI145-10)
北京市教委科研计划科技一般项目(KM202311232002)。
关键词
语言舆情
数据集构建
注意力机制
多模态融合
舆情识别
language public opinion
dataset construction
attention mechanism
multimodal fusion
public opinion recognition