期刊文献+

融合情感与语义的多模态对话生成方法 被引量:1

Multimodal Dialogue Generation Method Integrating Emotion and Semantics
下载PDF
导出
摘要 近年来,语音对话等一系列非可视化对话场景在生活中屡见不鲜,比如智能机器人的语音交互、各类客服通过语音对话了解客户需求等.音频中往往蕴含情感信息,而文本中则包含丰富的语义层面的信息,因此基于语音文本多模态特征更能充分挖掘语义及情感信息,生成信息更加丰富的对话响应.当前基于文本和音频的对话生成技术主要基于较传统的Seq2Seq模型实现,生成的响应存在多样性较低、上下文不够连贯等问题.为此,本文提出ATTransformer模型实现文本、音频多模态场景下的对话生成任务.首先使用WordEmbedding对上下文和回复进行词嵌入矩阵的构建,然后使用VGGish对对话音频进行特征提取,接着分别将其输入AT-Transformer模型中,并在多模态注意力机制中实现两种模态特征的融合,最后设计目标函数旨在提高生成语句的多样性.实验分别对情感丰富度、上下文语义相关性和句子连贯性进行评估,相较最优基准模型,情感匹配度提升2%,语义相关性提升0.5%. In recent years,a series of non-visual dialogue scenes such as voice dialogue have become common in life,such as the voice interaction of intelligent robots and all kinds of customer service to understand customer needs through voice dialogue.Audio often contains emotional information,while text contains rich semantic information,so it is of certain research significance to integrate more comprehensive audio features in text dialogue generation task.At present,the dialogue generation technology based on text and audio is mainly based on the traditional Seq2Seq model,and the generated responses have some problems,such as low diversity and insufficient coherence of context.At-Transformer model is therefore proposed in this article to realize the dialogue generation task in multi-mode text and audio scenarios.WordEmbedding is first used to construct the WordEmbedding matrix for the context and reply,and VGGish is used to extract the features of the dialogue audio.Then the features are inputted to the AT-Transformer model proposed in the article,and the fusion of the two modal features is implemented in the multi-modal attention mechanism.Finally,the objective function is designed to improve the diversity of generated statements.The experiment evaluated the emotional richness,contextual semantic relevance and sentence coherence respectively.Compared with the optimal benchmark model,the emotion matching degree increased by 2%,and the semantic relevance increased by 0.5%.
作者 张翼英 马彩霞 张楠 柳依阳 王德龙 ZHANG Yiying;MA Caixia;ZHANG Nan;LIU Yiyang;WANG Delong(College of Artificial Intelligence,Tianjin University of Science&Technology,Tianjin 300457,China)
出处 《天津科技大学学报》 CAS 2023年第3期52-60,共9页 Journal of Tianjin University of Science & Technology
基金 国家自然科学基金资助项目(61807024)。
关键词 多模态 对话生成 Transformer模型 文本生成 multimodal dialogue generation Transformer model text generation
  • 相关文献

参考文献2

二级参考文献13

共引文献59

同被引文献13

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部