摘要
针对机器阅读理解任务中的答案问题获取提出一种序列生成模型SGN。首先,SGN在问题矩阵空间获取问题与文章的匹配表示,并参照潜在的问题信息生成当前节点的词向量;然后使用一个选择门结构从文章或者字典中选择当前词汇,并且自发学习和归纳OOV(out-of-vocabulary)单词,解决语义表述不准确的问题;最后使用改进的覆盖机制消除生成序列中的冗余问题,从而提高可读性。实验通过人工数据集SQuAD进行验证,其结果表明,在阅读理解任务上SGN生成的目标序列与基准模型seq2seq相比可读性更加优异,并且与原文语义更贴近。
This paper proposed a sequence generation model SGN for answer acquisition in the machine reading comprehension task.First,the SGN obtained the matching expression between problem and article in problem matrix space,and generated the word vector of the current node according to the potential problem information.Then,it used a selection gate structure to select the current vocabulary from the article or dictionary,and spontaneously learned and generated OOV word to solve the problem of inaccurate semantic representation.Finally,it used improved coverage mechanism to eliminate redundancies in the generated sequence and improved readability.The experiments adopted the artificial data set SQuAD.The results show that the target sequence generated by SGN is more readable than the benchmark model seq2seq and is closer to the original semantics.
作者
霍欢
邹依婷
金轩城
黄君扬
薛瑶环
Huo Huan;Zou Yiting;Jin Xuancheng;Huang Junyang;Xue Yaohuan(School of Optical-Electrical&Computer Engineering,University of Shanghai for Science&Technology,Shanghai 200093,China;Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 201203,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第3期734-738,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61003031)
上海重点科技攻关项目(14511107902)
上海市工程中心建设项目(GCZX14014)
上海市一流学科建设项目(XTKX2012)
上海市数据科学重点实验室开放课题资助项目(201609060003)
沪江基金研究基地专项资助项目(C14001)。
关键词
答案获取
序列模型
OOV
覆盖机制
answer acquisition
sequence generation model
OOV(out-of-vocabulary)
coverage mechanism