摘要
结构化数据和非结构化文本被视为两种不同的模态。数据到文本生成是自然语言生成领域中一个重要的跨模态任务,该任务的目标是对于给定的结构化数据,生成一段文本用以描述结构化数据中包含的关键信息。近年的研究工作通常关注于描述性文本的生成,虽然取得了一定的研究进展,但仅能做到信息的传递而不能带来任何增益。为解决这一问题,本研究数据到分析性文本的生成,并针对该任务提出一个基于主题感知的跨模态序列到序列模型。该模型在编码器-解码器结构的基础上,引入数据表的主题信息以保证生成文本与数据表之间的主题一致性,提高生成文本的质量。为验证模型的性能,提出两个真实数据集,并与其他6个模型进行对比实验,结果表明,提出的模型取得了最好的性能。
The structured data and the unstructured text can be regarded as two different modalities.Data-to-text generation is an important cross-modal task in natural language generation field.Given structured data,this task aims to generate the corresponding text which describes the key information of the structured data.Recently,many studies generally focus on the descriptive text generation.Although these studies have achieved great progress,they can only present structured data without any information gain.To deal with this problem,this paper explores the data-to-analysis generation task,and proposes a topic-aware based cross-modal sequence-to-sequence model.Based on the encoder-decoder structure,the model introduces the topic information of the structure data to ensure the topic consistency between the generated text and the structure data and improve the quality of the generated text.To verify the performance of the proposed model,two real datasets are constructed and a series of experiments compared with six baselines are conducted.Experimental results show that the proposed model achieves the best performance.
作者
张旭
王旭强
田雨婷
杨青
孟洁
ZHANG Xu;WANG Xuqiang;TIAN Yuting;YANG Qing;MENG Jie(Information Communication Company, State Grid Tianjin Electric Power Company, Tianjin 300010, China)
出处
《山东科技大学学报(自然科学版)》
CAS
北大核心
2021年第3期71-79,共9页
Journal of Shandong University of Science and Technology(Natural Science)
基金
天津市科技计划项目(18ZXZNGX00310)
天津市电力公司科技项目(KJ19-1-38)。
关键词
自然语言生成
结构化数据
分析性文本
主题感知
跨模态
natural language generation
structured data
analytical text
topic-aware
cross-modal