摘要
问题生成的目标是生成有意义且流畅的问题,以增加可用数据来解决问答类型标注语料库的缺乏问题。以带有可选答案的未注释文本作为输入内容,问题生成可以根据是否提供答案分为两种类型:有答案型和无答案型。即使在提供答案的情况下,生成问题也是具有挑战性的,更不用说在没有提供答案的情况下,对于人类和机器来说生成高质量的问题更加困难。为了解决这个问题,我们提出了一种名为QGAE的新型端到端模型,它能够通过直接提取候选答案,将无答案的问题生成转化为有答案的问题生成。这种方法有效地利用未标记的数据来生成高质量的问答对,其端到端的设计使其比多阶段方法更加方便,后者需要至少两个预训练模型。此外,我们的模型获得了更好的平均分数和更大的多样性。我们的实验结果表明,QGAE在生成问答对方面取得了显著的进展,成为了一种充满潜力的问题生成方法。
Question generation aims to generate meaningful and fluent questions,which can address the lack of a question-answer type annotated corpus by augmenting the available data.Using unannotated text with optional answers as input contents,question generation can be divided into two types based on whether answers are provided:answer-aware and answer-agnostic.While generating questions by providing answers is challenging,generating high-quality questions without providing answers is even more difficult for both humans and machines.To address this issue,we proposed a novel end-to-end model called question generation with answer extractor(QGAE),which is able to transform answer-agnostic question generation into answer-aware question generation by directly extracting candidate answers.This approach effectively utilizes unlabeled data for generating high-quality question-answer pairs,and its end-to-end design makes it more convenient than a multi-stage method that requires at least two pre-trained models.Moreover,our model achieves better average scores and greater diversity.Our experiments show that QGAE achieves significant improvements in generating question-answer pairs,making it a promising approach for question generation.
作者
李林枫
张立成
朱池苇
毛震东
Linfeng Li;Licheng Zhang;Chiwei Zhu;Zhendong Mao(School of Cyber Science and Technology,University of Science and Technology of China,Hefei 230027,China;School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China)
出处
《中国科学技术大学学报》
CAS
CSCD
北大核心
2024年第1期11-18,10,I0001,共10页
JUSTC
基金
supported by the Fundamental Research Funds for Central Universities (WK3480000010, WK3480000008)。
关键词
深度学习
自然语言处理
无答案问题生成
答案抽取
deep learning
natural language processing
answer-agnostic question generation
answer extraction