摘要
文本生成是自然语言处理的热门领域,随着信息收集能力的不断增长,人们收集到越来越多的结构化数据,如表格。如何解决信息过载问题,理解表格含义并描述表格内容是人工智能面临的重要问题,因此有了表格到文本生成任务。表格到文本生成是指语言模型输入表格数据后生成表格的对应文本描述。模型生成的文本描述应该语句流畅,充分表达表格信息且不能偏离表格事实。描述了表格到文本生成任务背景并做出了详细定义,分析了当前任务主要难点并介绍了主流研究方法。表格到文本生成共有两大问题:描述什么,如何描述。梳理了不同研究人员针对这两大问题所提出的解决方法,同时总结了所提出模型的特点、优势以及劣势。对比分析了这些优秀模型在主流数据集上的表现,同时根据模型类型进行归类,并进行横向比较分析。介绍了表格到文本生成领域较为通用的评价方法,总结了不同评价方法的特点、优势以及劣势。最后展望了表格到文本生成任务未来发展趋势。
Text generation is a hot field in natural language processing.With the increasing capability of information collection,more and more structured data,such as tables,are collected.How to solve the problem of information overload,understand the table meaning and describe the table content is an important problem of artificial intelligence,so the task of table-to-text generation appears.Table-to-text generation refers to the language model input table data generated after the corresponding text description of the table.The text description generated by the model should express the information of the table smoothly and not deviate from the fact of the table.Firstly,this paper describes and defines the task background from table-to-text generation in detail,analyzes the main difficulties of the task,and introduces the main research methods.There are two major issues on table-to-text generation:what to describe and how to describe it.This paper summarizes the methods proposed by different researchers to solve these two problems,and summarizes the characteristics,advantages and disadvantages of the proposed models.The performance of these excellent models on the main dataset is compared and analyzed.At the same time,the models are classified according to the model type,and the horizontal comparative analysis is carried out.This paper also introduces the common evaluation methods in the field of table-to-text generation,and summaries the characteristics,advantages and disadvantages of different evaluation methods.Finally,this paper prospects the future development trend of table-to-text generation task.
作者
胡康
奚雪峰
崔志明
周悦尧
仇亚进
HU Kang;XI Xuefeng;CUI Zhiming;ZHOU Yueyao;QIU Yajin(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215000,China;Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology,Suzhou,Jiangsu 215000,China;Suzhou Smart City Research Institute,Suzhou,Jiangsu 215000,China)
出处
《计算机科学与探索》
CSCD
北大核心
2022年第11期2487-2504,共18页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金(61876217,62176175)
江苏省“六大人才高峰”高层次人才项目(XYDXX-086)
苏州市科技计划项目(SGC2021078)。
关键词
自然语言处理
文本生成
结构化数据
表格到文本生成
natural language processing
text generation
structured data
table-to-text generation